LEVEQUE 


TOPICS IN 


WILLIAM J 


NUMBER 


HEORY 


ae 


VOLUMES I! AND II 


DOVER BOOKS ON MATHEMATICS 


HANDBOOK OF MATHEMATICAL FUNCTIONS, Milton Abramowitz and Irene A. 
Stegun. (61272-4) 


TENSOR ANALYSIS ON MANIFOLDS, Richard L. Bishop and Samuel I. Goldberg. 
(64039-6) 


TABLES OF INDEFINITE INTEGRALS, G. Petit Bois. (60225-7) 


VECTOR AND TENSOR ANALYSIS WITH APPLICATIONS, A. I. Borisenko and I. E. 
Tarapov. (63833-2) 


THE HISTORY OF THE CALCULUS AND ITS CONCEPTUAL DEVELOPMENT, Carl B. 
Boyer. (60509-4) 


THE QUALITATIVE THEORY OF ORDINARY DIFFERENTIAL EQUATIONS: AN 
INTRODUCTION, Fred Brauer and John A. Nohel. (65846-5) 


ALGORITHMS FOR MINIMIZATION WITHOUT DERIVATIVES, Richard P. Brent. (41998-3) 
PRINCIPLES OF STATISTICS, M. G. Bulmer. (63760-3) 

THE THEORY OF SpiNors, Elie Cartan. (64070-1) 

ADVANCED NUMBER THEORY, Harvey Cohn. (64023-X) 


STATISTICS MANUAL, Edwin L. Crow, Francis Davis, and Margaret Maxfield. 
(60599-X) 


FOURIER SERIES AND ORTHOGONAL FUNCTIONS, Harry F. Davis. (65973-9) 
COMPUTABILITY AND UNSOLVABILITY, Martin Davis. (61471-9) 
ASYMPTOTIC METHODS IN ANALYsIS, N. G. de Bruijn. (64221-6) 
PROBLEMS IN GRouP THEORY, John D. Dixon. (61574-X) 

THE MATHEMATICS OF GAMES OF STRATEGY, Melvin Dresher. (64216-X) 


APPLIED PARTIAL DIFFERENTIAL EQUATIONS, Paul DuChateau and David 
Zachmann. (41976-2) 


ASYMPTOTIC EXPANSIONS, A. Erdélyi. (660318-0) 


COMPLEX VARIABLES: HARMONIC AND ANALYTIC FUNCTIONS, Francis J. Flanigan. 
(61388-7) 


ON FORMALLY UNDECIDABLE PROPOSITIONS OF PRINCIPIA MATHEMATICA AND 
RELATED SYSTEMS, Kurt Gédel. (66980-7) 


A HISTORY OF GREEK MATHEMATICS, Sir Thomas Heath. (24073-8, 24074-6) 
Two-volume set 


PROBABILITY: ELEMENTS OF THE MATHEMATICAL THEORY, C. R. Heathcote. 
(41149-4) 


INTRODUCTION TO NUMERICAL ANALYSIS, Francis B. Hildebrand. (65363-3) 
METHODS OF APPLIED MATHEMATICS, Francis B. Hildebrand. (67002-3) 
TopoLocy, John G. Hocking and Gail S. Young. (65676-4) 

MATHEMATICS AND Locic, Mark Kac and Stanislaw M. Ulam. (67085-6) 
MATHEMATICAL FOUNDATIONS OF INFORMATION THEORY, A. I. Khinchin. (60434-9) 
ARITHMETIC REFRESHER, A. Albert Klaf. (21241-6) 

CALCULUS REFRESHER, A. Albert Klaf. (20370-0) 

PROBLEM BOOK IN THE THEORY OF FUNCTIONS, Konrad Knopp. (41451-5) 
INTRODUCTORY REAL ANALYSIS, A. N. Kolmogorov and S. V. Fomin. (61226-0) 


(continued on back flap) 


TOPICS IN 


NUMBER THEORY 
VOLUMES I AND II 


WILLIAM JUDSON LEVEQUE 


DOVER PUBLICATIONS, INC. 
Mineola, New York 


Copyright 


Copyright © 1956, 1984 by William Judson LeVeque 
All rights reserved under Pan American and International Copyright 
Conventions. 


Bibliographical Note 


This Dover edition, first published in 2002, is an unabridged republi- 
cation of the work published in two volumes by the Addison-Wesley 
Publishing Company, Inc., Reading, Massachusetts, 1956. The Errata 
List was prepared especially for this edition by the author. 


The two volumes contained in this book are paginated separately (and 
have separate tables of contents). Volume II begins following page 202 of 
Volume I. 


Library of Congress Cataloging-in-Publication Data 


LeVeque, William Judson. 
Topics in number theory / by William Judson LeVeque.—Dover ed. 
p. cm. 
Originally published: Reading, Mass., Addison-Wesley Pub., [1956]. 
Includes bibliographical references and index. 
ISBN 0-486-42539-8 (pbk. : set) 
1. Number theory. I. Title. 


QA241 .L58 2002 
512'.7—de21 
2002067433 


Manufactured in the United States of America 
Dover Publications, Inc., 31 East 2nd Street, Mineola, N.Y. 11501 


TOPICS IN 
NUMBER THEORY 


VOLUME J] 


To A. J. Kempner 


PREFACE 


The theory of numbers, one of the oldest branches of mathematics, 
has engaged the attention of many gifted mathematicians during the 
past 2300 years. The Greeks, Indians, and Chinese made significant 
contributions prior to 1000 a.p., and in more modern times the sub- 
ject has developed steadily since Fermat, one of the fathers of Western 
mathematics. It is therefore rather surprising that there has never 
been a strong tradition in number theory in America, although a few 
men of the stature of L. E. Dickson have emerged to keep the flame 
alive. But in most American universities the theory of numbers is 
regarded as a slightly peripheral subject, which has an unusual flavor 
and unquestioned historical importance, but probably merits no more 
than a one-term course on the senior or first-year graduate level. It 
seems to me that this is an inappropriate attitude to maintain toward 
a subject which is flourishing in European hands, and which has 
contributed so much to the mathematics of the past and which 
promises exciting developments in the future. Changing its status is 
complicated, however, by the paucity of advanced works suitable 
for use as textbooks in American institutions. There are several 
excellent elementary texts available, and an ever-increasing number 
of monographs, mostly European, but to the best of my knowledge 
no general book designed for a second course in the theory of numbers 
has appeared since Dickson ceased writing. In Volume II of the 
present work J have attempted partially to fill this gap. 

When I began to write Volume II, the number of introductory 
texts was very small, and no one of them contained all the informa- 
tion I found occasion to refer to. Since I had already written lecture 
notes for a first course, there seemed to be some advantage in expand- 
ing them into a more complete exposition of the standard elementary 
topics. Volume [ is the result; it is designed to serve either as a self- 
contained textbook for a one-term course in number theory, or as a 
preliminary to the second volume. The two volumes together are 
intended to provide an introduction to some of the important tech- 
niques and results of classical and modern number theory; I hope 
they will prove useful as a first step in the training of students who are 
or might become seriously interested in the subject. 

vii 


Vill PREFACE 


In view of the diversity of problems and methods grouped together 
under the name of number theory, it is clearly impossible to write 
even an introductory treatment which in any sense covers the field 
completely. My choice of topics was made partly on the basis of my 
own taste and knowledge, of course, but also more objectively on the 
grounds of the technical importance of the methods developed or of the 
results obtained. It was this consideration which led me, for example, 
to give a standard function-theoretic proof of the Prime Number 
Theorem in the second volume: the analytic method has proved to 
be extremely powerful and is applicable to a large variety of problems, 
so that it must be considered an essential tool in the subject, while 
the elementary Erdés-Selberg method has found only limited applica- 
tions, and so for the time being must be regarded as an isolated 
device, of great interest to the specialist but of secondary importance 
to the beginner. 

In a similar vein, I have on several occasions given proofs which 
are neither the shortest nor most elegant known, but which seem to 
me to be the most natural, or to lead to the deepest understanding of 
the phenomena under consideration. For example, the proof given 
in Chapter 8, Volume I, of Hurwitz’ theorem on the approximation 
of an irrational number by rationals is perhaps not as elegant as some 
others known, but of those which make no use of continued fractions, 
it is the only one I am familiar with which does not require prior 


knowledge of the special role played by the number V5. To my 
mind, these other proofs are inferior pedagogically, in that they give 
no hint as to how the student might attack a similar problem. 

Most of the material in the first volume is regularly included in 
various elementary courses, although it would probably be impossible 
to cover the entire volume in one semester. This allows the instructor 
to choose topics to suit his taste and, what is even more important 
for my general purpose, it presents the student with an opportunity 
for further reading in the subject. 

I consider the first volume suitable for presentation to advanced 
undergraduate and beginning graduate students, insofar as the dif- 
ficulty of the subject matter is concerned. No technical knowledge 
is assumed except in Section 3-5 and in Chapter 6, where calculus is 
used. On the other hand, elementary number theory is by no means 
easy, and that vaguely defined quality called mathematical maturity 
is of great value in developing a sound feeling for the subject. I 


PREFACE 1x 


doubt, though, that it should be considered a prerequisite, even if it 
could be measured; studying number theory is perhaps as good a 
way as any of acquiring it. 

Rather few of the problems occurring at the ends of sections are 
of the routine computational type; I assume that the student can 
devise such problems as well as I. It has been my experience that 
many of those included offer some difficulty to most students. For 
this reason I have appended hints in profusion, and have indicated 
by asterisks a few problems that remain more difficult than the 
average. 

The development of continued fractions in the final chapter may be 
sufficiently novel to warrant a word of explanation. I have chosen to 
regard as the basic problem that of finding the ‘‘good”’ rational ap- 
proximations to a real number, and have derived the regular con- 
tinued fraction as the solution. This procedure seems to me to be 
pedagogically better than the classical treatment, in which one 
simply defines a continued fraction and verifies that the convergents 
have the requisite property. Moreover, this same approach looks 
promising for the corresponding problem of approximating complex 
numbers by the elements of a fixed quadratic field, while earlier 
attempts to define a useful complex continued fraction algorithm 
have been conspicuously unsuccessful. The idea of associating an 
interval with each Farey point .is derived from work by K. Mahler, 
who, with J. W. 8S. Cassels and W. Ledermann, investigated the much 
more complicated Gaussian case [Philosophical Transactions of the 
Royal Society, A (London) 243, 585-628 (1951). 

I am grateful to Professors T. Apostol, A. Brauer, B. W. Jones, and 
K. Mahler for their many constructive criticisms, to Mrs. Edith Fisher 
for her help in typing the manuscript, and to Mr. Earl Lazerson for 
his invaluable aid in proofreading. 

W. J. iL. 
Ann Arbor, Michigan 
November, 1955 


CONTENTS 


CHAPTER 1 INTRODUCTION. 
1-1 What is number theory? . 
1-2 Proofs 3 
1-3 Radix representation : 


CHAPTER 2 THe EUCLIDEAN ALGORITHM AND ITS CONSE- 
QUENCES 
2-1 Divisibility . . 
2-2 The Euclidean algorithm and greatest common divisor 
2-3 The Unique Factorization Theorem 
2-4 The linear Diophantine equation 
2-5 The least common multiple . 


CHAPTER 3 CONGRUENCES. 
3-1 Introduction : ; 
3-2 Elementary properties of congruences 
3-3 Residue classes and Euler’s ¢g-function . 
3-4 Linear congruences 
3-5 Congruences of higher degree 
3-6 Congruences with prime moduli.” 
3-7 The theorems of Fermat, Euler, and Wilson 


CHAPTER 4 PRIMITIVE Roots AND INDICES . 


4-1 Integers belonging to a given exponent (mod p) . 
4-2 Primitive roots of composite moduli 
4-3 Indices . 


4-4 An application to Fermat’s conjecture : 


CHAPTER 5 QUADRATIC RESIDUES 
5-1 Introduction 
5-2 Composite moduli 
5-3 Quadratic residues of primes, “and the Legendre symbol . 
5-4 The law of quadratic eee) 
5-5 An application od 
5-6 The Jacobi symbol 


CHAPTER 6 NUMBER-IHEORETIC FUNCTIONS AND THE Dis- 
TRIBUTION OF PRIMES 
6-1 Introduction : 
6-2. The Mobius function . 
6-3 The function [z] 


xi 


oc ee 


x11 CONTENTS 


6-4 The symbols “‘O’’, ‘‘o”, and “~” 

6-5 The sieve of Eratosthenes 

6-6 Sums involving primes 

6-7 The order of r(x) . 

6-8 Bertrand’s conjecture ; , 
6-9 The order of magnitude of ¢, a, and a 
6-10 Average order of magnitude . 

6-11 An application 


CHAPTER 7 SuMS OF SQUARES 
7-1 An approximation theorem . 
7-2 Sums of two squares . 
7-3 The Gaussian integers 
7-4 The total number of representations 
7-5 Sums of three squares 
7-6 Sums of four squares . 


CHAPTER 8 PELL’S EQUATION AND SOME APPLICATIONS . 
8-1 Introduction 
8-2 The case N = +1. 
8-3 Thecase|N|>1. 
8-4 An application . 
8-5 The minima of indefinite quadratic forms . 
8-6 Farey sequences, and a proof of Hurwitz’ theorem 


CHAPTER 9 RaTIONAL APPROXIMATIONS TO REAL NUMBERS 
9-1 Introduction 
9-2 The rational case . 
9-3 The irrational case : 
9-4 Quadratic irrationalities . 
9-5 Application to Pell’s equation 
9-6 Equivalence of numbers . 


SUPPLEMENTARY READING . 
List OF SYMBOLS. 


INDEX 


ERRATA 


92 

97 
100 
105 
108 
112 
116 
122 


125 


125 
126 
129 
131 
133 
133 


137 


137 
139 
145 
148 
153 
154 


159 


159 
168 
172 
176 
181 
184 


194 
196 
197 
199 


CHAPTER 1 


INTRODUCTION 


1-1 What is number theory? In number theory we are concerned 
with properties of certain of the integers 


Te ey aes ees Cay 


or sometimes with those properties of the real or complex numbers 
which depend rather directly on the integers. As in most branches of 
abstract thought, it is easier to characterize the theory of numbers 
extensively, by giving a large number of examples of problems which 
are usually considered parts of number theory, than to define it 
intensively, by saying that exactly those problems having certain 
characteristics will be included in the subject. Before considering 
such a list of types of problems, however, it might be worth while to 
make an exclusion. 

In the opinion of the author, the theory of numbers does not include 
the axiomatic construction or characterization either of systems of 
numbers (integers, rational numbers, real numbers, or complex 
numbers) or of the fundamental operations and relations in these 
sets. Toward the end of this chapter, a few properties of the integers 
are mentioned which the student may not have considered explicitly 
before; aside from these, no properties will be assumed beyond what 
any high-school pupil knows. It is, of course, quite possible that the 
student will not have read a logical treatment of elementary arith- 
metic; if he wishes to do so, he might examine E. Landau’s elegant 
Foundations of Analysis (New York: Chelsea Publishing Company, 
1951), but he should not expect to find a treatment of this kind 
here. The contents of such a book are, in a sense, assumed to be 
known to the reader, but as far as understanding number theory is 
concerned, this assumption is of little consequence. 

The problems treated in number theory can be divided into groups 
according to a more or less rough classification. First, there are 
multiplicative problems, concerned with divisibility properties of the 
integers. It will be proved later that any positive integer n greater 
than 1 can be represented uniquely, except for the order of the factors, 

1 


2 INTRODUCTION [coap. 1 


as a product of primes, a prime being any integer greater than 1 having 
no exact divisors except itself and 1. This might almost be termed 
the Fundamental Theorem of number theory, so manifold and varied 
are its applications. From the decomposition of » into primes, it 1s 
easy to determine the number of divisors of n. This number is called 
7(n) by some writers and d(n) by others; we shall use the former 
designation. The behavior of 7(n) is very erratic; the first few values 
are as follows: 


T(n) n |7(n) 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 


Ifn = 2”, the divisors of n are 1, 2, 2?,..., 2”, so that 7(2”) = m +1. 
On the other hand, if n is a prime, then 7(n) = 2. Since, as we shall 
see, there are infinitely many primes, it appears that the 7-function 
has arbitrarily large values, and yet has the value 2 for infinitely 
many n. Many questions might occur to anyone who thinks about 
the subject for a few moments and studies the above table. For 
example, 

(a) Is it true that 7(m) is odd if and only if n is a square? 

(b) Is it always true that if m and n have no common factor, then 
t(m)r(n) = r(mn)? 

(c) Do the arguments of the form 2” give the relatively largest 
values of the 7-function? That is, is the inequality 


pt pk eed 

BES cer smmownls 
OnrR WRN KRY WN D = 
COnD FP PDN @® WY Orr SP b 


rn) < eo +1 


correct for all n? If not, is there any better upper bound than the 
trivial one, r(n) <n? 


1-1] WHAT IS NUMBER THEORY? 3 


(d) How large is 7(n) on the average? That is, what can be said 
about the quantity 


N 
dL 7(n), 


n=1 


=| 


as N increases indefinitely? 

(e) For large N, approximately how many solutions n < N are 
there of the equation 7(n) = 2? In other words, about how many 
primes are there among the integers 1, 2,..., N? 

Of the above questions, which are fairly typical problems in 
multiplicative number theory, the first is very easy to answer in the 
affirmative. The next three are more difficult; they will be considered 
in Chapter 6. The last is very difficult indeed. It was conjectured 
by C. F. Gauss and A. Legendre, two of the greatest of number 
theorists, that the number (NV) of primes not exceeding N is approxi- 
mately N/log N, in the sense that the relative error 


n(N) — (N/log N)| _ | _x(WV) _ | 
N/log N N/log N 


is very small when N is sufficiently large. Many years later 
(1852-54), P. L. Chebyshev showed that if this relative error has any 
limiting value, it must be zero, but it was not until 1896 that J. Hada- 
mard and C. de la Vallée Poussin finally proved what is now called the 
Prime Number Theorem, that 


go UN ws 
ve N/logN 


In another direction, we have the problems of additive number 
theory: questions concerning the representability, and the number of 
representations, of a positive integer as a swm of integers of a specified 
kind. For instance, upon examination it appears that some integers, 
like 5 = 17 + 2? and 13 = 2? + 37, are representable as a sum of 
two squares, while others, like 3 or 12, are not. Which integers are 
so representable, and how many such representations are there? 

A third category might include what are known as Diophantine 
equations, named after the Greek mathematician Diophantos, who 
first studied them. These are equations in one or more variables 
whose solutions are required to be integers, or at any rate rational 


4 INTRODUCTION |cHap. 1 


numbers. For example, it is a familiar fact that 3? + 4? = 57, 
which gives us a solution of the Diophantine equation 2? + y”? = 2’. 
Giving a particular solution is hardly of interest; what is desired is an 
explicit formula for all solutions. A very famous Diophantine 
equation is that known as Fermat’s equation: 2” + y” = 2”. 
P. Fermat asserted that this equation has no solution (in nonzero 
integers, of course) if n > 3; the assertion has never been proved or 
disproved for general n. There is at present practically no general 
theory of Diophantine equations, although there are many special 
methods, most of which were devised for the solution of particular 
equations. 

Finally, there are problems in Diophantine approximations. For 
example, given a real number é and a positive integer N, find that 
rational number p/q for which g < N and |é — (p/q)| is minimal. 
The proofs that e and 7m are transcendental also fall in this category. 
This branch of number theory probably borrows the most from, and 
contributes the most to, other branches of mathematics. 

The theorems of number theory can also be subdivided along 
entirely different lines—for example, according to the methods used 
in their proofs. Thus the dichotomies of elementary and nonelemen- 
tary, analytic and synthetic. <A proof is said to be elementary 
(although not necessarily simple!) if it makes no use of the theory of 
functions of a complex variable, and synthetic if it does not involve 
the usual concepts of analysis—limits, continuity, etc. Sometimes, 
but not always, the nature of the theorem shows that the proof will 
be in one or another of these categories. For example, the above- 
mentioned theorem about z(z) is clearly a theorem of analytic 
number theory, but it was not until 1948 that an elementary proof 
was found. On the other hand, the following theorem, first proved 
by D. Hilbert, involves in its statement none of the concepts of 
analysis, yet the only proofs known prior to 1942 were analytic: 
Given any positive integer k, there is another integer s, depending 
only on k, such that every positive integer is representable as the sum 
of at most s kth powers, i.e., such that the equation 


n= aber tae 


is solvable in non-negative integers x1, ..., xs for every n. 
It may seem strange at first that the theory of functions of a com- 


1-2] PROOFS 5 


plex variable is useful in treating arithmetic problems, since there 1s, 
prima facie, nothing common to the two disciplines. Even after we 
understand how function theory can be used, we must still reconcile 
ourselves to the rather disquieting thought that it apparently must 
be used in some problems—that there is, at present, simply no other 
way to deal with them. What is perhaps not a familiar fact to the 
general reader is that function theory is only one of many branches 
of mathematics which are at best only slightly related to number 
theory, but which enter in an essential way into number-theoretic 
considerations. This is true, for example, of abstract algebra, prob- 
ability, Euclidean and projective geometry, topology, the theory of 
Fourier series, differential equations, and elliptic and other auto- 
morphic functions. In particular, it would appear that the rather 
common subsumption of number theory under algebra involves a 
certain distortion of the facts. 


1-2 Proofs. It is a well-known phenomenon in mathematics that 
an excessively simple theorem frequently is difficult to prove (al- 
though the proof, in retrospect, may be short and elegant) just 
because of its simplicity. This is probably due in part to the lack 
of any hint in the statement of the theorem as to the machinery to be 
used in proving it, and in part to the lack of available machinery. 
Since many theorems of elementary number theory are of this kind, 
and since there is considerable diversity in the types of arguments 
used in their proofs, it might not be amiss to discuss the subject 
briefly. 

First a psychological remark. If we are presented with a rather 
large number of theorems bearing on the same subject but proved by 
quite diverse means, the natural tendency is to regard the techniques 
used in the various proofs as special tricks, each applicable only to 
the theorem with which it is associated. A technique ceases to be a 
trick and becomes a method only when it has been encountered enough 
times to seem natural; correspondingly, a subject may be regarded 
as a “‘bag of tricks” if the relative number of techniques to results is 
too high. Unfortunately, elementary number theory has sometimes 
been regarded as such a subject. On working longer in the field, 
however, we find that many of the tricks become methods, and that 
there is more uniformity than is at first apparent. By making a 


6 INTRODUCTION [cuap. 1 


conscious effort to abstract and retain the germs of the proofs that 
follow, the reader will begin to see patterns emerging sooner than he 
otherwise might. 

Consider, for example, the assertion that 7(n) is even unless n is a 
square, i.e., the square of another integer. A proof of this is as 
follows: If dis a divisor of n, then so is the integer n/d. Ifnisnota 
square, then d ¥ n/d, since otherwise n = d?. Hence, if n is not a 
square, its divisors can be paired off into couples d, n/d, so that each 
divisor of n occurs just once as an element of some one of these 
couples. The number of divisors is therefore twice the number of 
couples and, being twice an integer, is even. 

The principle here is that when we want to count the integers hav- 
ing a certain property (here ‘“‘count” may also be replaced by ‘“‘add’’), 
it may be helpful first to group them in judicious fashion. There are 
several problems in the present book whose solutions depend on this. 
idea. 

In addition to the special methods appropriate to number theory, 
we shall have many occasions to use two quite general types of proof 
with which the student may not have had much experience: proofs 
by contradiction and proofs by induction. 

An assertion P is said to have been proved by contradiction if it 
has been shown that, by assuming P false, we can deduce an assertion 
Q which is known to be incorrect or which contradicts the assumption 
that P is false. As an example, consider the theorem (known as 
early as the time of Euclid) that there are infinitely many prime num- 
bers. To prove this by contradiction, we assume the opposite, 
namely, that there are only finitely many primes. Let these be 


D1, D2,.--, Dk; let N be the integer pipe... px +1; and let Q be the 
assertion that N is divisible by some prime different from any of 
D1, Po, -.-, De. Now WN is divisible by some prime p (if N is itself 


prime, then p = NV), and N is not divisible by any of the py, peo, ... , Dr, 
since each of these primes leaves a remainder of 1 when divided into 
N. Hence Q is true. Since Q is not compatible with the falsity of the 
theorem, the theorem is true. 

As for proofs by induction, let P(n) be a statement involving an 
integral variable n; we wish to prove that P(n) is valid for every 
integer n not less than a particular one, say m. The induction 
principle says that if P(no) is valid, and if, for every n > 7, one can 


1-2] PROOFS 7 


deduce P(n + 1) by assuming that P(n) is valid, then P(n) is valid 
for every integer n > np. The statement P(no) must, of course, be 
proved independently; usually, though not always, by direct verifica- 
tion. The difficulty, if any, normally les in showing that P(n) 
implies P(n + 1). 

As an example, let us undertake to prove that the formula 


Lt 2$--tn= Med (1) 


is correct, whatever the positive integer n may be. Here no = 1. 
There are three steps: 

(a) Show that the formula is correct when n = 1. This is trivial 
here. 

(b) Show that if n is an integer for which (1) holds, the same is 
true of n + 1. Butif (1) holds, then adding n + 1 to both sides gives 


mee aoe se mle 


and this is simply (1) with n replaced by n + 1, so that P(m) implies 
P(n + 1). 

(c) Use the principle of induction to deduce that (1) holds for 
every positive integer n. 

As a second example, consider the Fibonacci sequence 


1, 2, 3, 5, 8, 18, 21,..., 
in which every element after the second is the sum of the two numbers 


immediately preceding it. If we denote by u,, the nth element of this 
sequence, then the sequence is recursively defined by the conditions 


1+24+---+n4(n+1) = 


uw = 1, 

Ug = 2, 

Un = Un1tine n> 3. (2) 
We may verify, for as large n as we like, that 


To prove this for all positive integral n, we take for P(n) the following 
statement: the inequalities 


8 INTRODUCTION [cnap. 1 
Un<(B” and = uni < )"" (3) 


hold. This is clearly equivalent to the earlier statement. We repeat 
the three steps. 


(a) For n = 1, P(n) reduces to the assertion that 1 < ?~ and 
2 < (4). 

(b) The induction hypothesis now is that u, < ({)” and 
Unzi < (~)"*!, where n is a positive integer. Since n + 2 > 3, we 
have that 


Unto = Unt + Un, 
by (2). Hence 

Une < (G+ D* = HA t+ DP < DH”: BD’ = H”™, 
and this inequality, together with the induction hypothesis, shows that 

Ung <(G)"t? and =~ Unge < (H)”™, 
so that P(n) implies P(n + 1). 

(c) By the induction principle, it follows that the inequality (3) 
holds for all positive integral n. 

To avoid the artificial procedure of the last proof, it is frequently 
convenient to use the following formulation of the principle of induc- 
tion, which can be shown to be equivalent to the first: if P(mo) 1s 
valid, and if for every n > no the propositions P(ng), P(m + 1), 
..., P(m) together imply P(n + 1), then P(n) is valid for every 
n= No. 

Using this formulation, we could have taken P(n) to be the asser- 
tion un, < ({)” in the second example. 

Besides the principle of induction, we shall have occasion to use 
three other properties of integers which the reader may not have 
encountered explicitly before: 

(a) Every nonempty set of positive integers (or of non-negative 
integers) has a smallest element. 

(b) If a and b are positive integers, there exists a positive integer n 
such that na > b. 

(c) Let n be a positive integer. If a set of n + 1 elements is sub- 
divided into n or fewer subsets, in such a way that each element 
belongs to precisely one subset, then some subset contains more than 
one element. 


These assertions, which are consequences of the underlying axioms, 
will be assumed without proof. 


1-3] RADIX REPRESENTATION 9 
PROBLEMS 


1. Show that 7(n) is odd if n is a square. 
2. Prove that 


@) yma"), o Dm ees, 


n 2 1 2 
() Dama Tet, 
m=1 4 
first by induction on n, and second by writing 
n+1 n+1 
Lm = L ((m— 1) +1) 
m=1 m=1 


and applying the binomial theorem to the summands on the right. Since 
the terms >_7 m* drop out, this method can be used with k = 2 to prove 
(a), then with k = 3 to prove (b), ete. 

3. Prove by induction that no two consecutive elements of the Fibonacci 
sequence w1, Ue, ... have a common divisor greater than 1. 

4. Carry out the second proof of the inequality 

Un < (f)" 
as indicated in the text. 

5. Prove by induction that every integer greater than 1 can be repre- 
sented as a product of primes. 

6. Anticipating Theorem 1-1, suppose that every integer can be written 
in the form 6k + 7, where & is an integer and r is one of the numbers 
0, 1, 2, 3, 4, 5. 

(a) Show that if p = 6k +7 is a prime different from 2 and 3, then 
r=lor5. 

(b) Show that the product of numbers of the form 6k + 1 is of the same 
form. 

(c) Show that there exists a prime of the form 6k — 1 = 6(k — 1) + 5. 

(d) Show that there are infinitely many primes of the form 6k — 1. 


1-3 Radix representation. Although we have assumed a knowl- 
edge of the structure of the system of integers, we have said nothing 
about the method by which we will assign names to the integers. 
There are, of course, various ways of doing this, of which the Roman 
and decimal systems are probably the best known. While the decimal 
system has obvious advantages over Roman numerals, and the 
advantage of familiarity over any other method, it is not always the 
best system for theoretical purposes. A rather more general scheme 


10 INTRODUCTION [cHaP. 1 


is sometimes convenient, and it is the object of the following two 
theorems to show that this kind of representation is possible, 1.e., 
that each integer can be given a unique name. Here, and until 
Chapter 6, lower-case Latin letters will denote integers. 


THEOREM 1-1. If ais positive and b 1s arbitrary, there ts exactly one 
pair of integers q, r such that the conditions 


=ga-+r, 0<r<a, (4) 
hold. 


Proof: First, we show that (4) has at least one solution. 
Consider the set D of integers of the form b — ua, where wu runs 
over all integers, positive and nonpositive. For the particular choice 


e if b > 0, 
1b fb <0, 


the number b — wa is non-negative, so that D contains non-negative 
elements. The subset consisting of the non-negative elements of D 
has asmallest element. Taker to be this number, and q the value of u 
which corresponds to it. Then 


r=b—qa>0, r—a=b—(q+1lja<Q0O, 


so that (4) is satisfied. 
To show the uniqueness, assume that also 


re b= qa+r, 0<r' <a. 
Then if q’ <q, 


b—ga=r>b—(q-—la=rt+a>a, 
while if q’ > q, 
b—ga=r <b—(q+la=r—a<0. 


Hence qd =qr =r. 


THEOREM 1-2. Let g be greater than 1. Then each a greater than 0 
can be represented uniquely in the form 


a= + cig +--+ eng", 
where Cy ts positive andO0 < tm <gfor0 << m<n. 


1-3] RADIX REPRESENTATION 11 


Proof: We prove the representability by inductionona. Fora = 1 
we have n = 0, cp = 1. 

Take a greater than 1 and assume that the theorem is true for 1, 2, 

.,a@—1. Since g is larger than 1, the numbers 9”, g’, g’, ... form 
an increasing sequence, and any positive integer lies between some 
pair of successive powers of g. More precisely, there is a unique 
n > 0 such that g” < a < g”*'. By Theorem 1-1, 


a= Cag” +7, 0O<r<q". 
Here c, > 0, since cng” =a —r>g” — g” = 0; moreover, Cn <g 
because cng” <a<g""l. Ifr =0, 
a=0+0-:gt-:-+0-g"! + cng”; 
while if r is positive, the induction hypothesis shows that r has a 
representation of the form 
r=bo tbig+::++ by’, 


where b; is positive and 0<b,» <g for O<m<it. Moreover, 
t<n. Thus 


a=botbigt---+tbgt + 0-gtt +---+0-g%* + eng”. 


Now use the induction principle. 
To prove uniqueness, assume that 


a=cqtagt-:--+tog =d +dgt-::-+ 4g, 


with n > 0, cn > 0, and 0 < co, < g for0 < m < n, and alsor > 0, 
d,>0, and 0<d,<g for 0<m<r. Then, by subtraction, 
we have 


O=e tegt---t eg’, 


where @m = Cm — Gm and where s is the largest value of m for which 
Cm ¥% Um, 80 that es ~ 0. If s = 0, we have the contradiction ey) = 
e, = 0. Ifs > 0 we have, since 
lem] = |¢m — dnl <g — 1 
and 
eg? = —(eo +--+: + es_19*), 


g? < |esg*| = leo +--+ + es-1g* "| < leol +--+ + les_alg?™ 
<g-lDdltgt--- +9") =HF -1, 


12 INTRODUCTION [cHaP. 1 


which is also a contradiction. We conclude that n = r and cm = dm 
for 0 < m < n, and the representation is unique. 


By means of Theorem 1-2 we can construct a system of names or 
symbols for the positive integers in the following way. We choose 
arbitrary symbols to stand for the digits (i.e., the non-negative inte- 
gers less than g) and replace the number 


Co + ag +++ + cng" 


by the simpler symbol ¢ncn_1 °° + ¢1¢o. For example, choosing g to be 
ten, and giving the smaller integers their customary symbols, we have 
the ordinary decimal system, in which, for example, 2743 is an abbre- 
viation for the value of the polynomial 2z° + 7x” + 4x + 3 when x 
is ten. But there is no reason why we must use ten as the base, or 
radix; if we used seven instead, we would write the integer whose 
decimal representation is 2743 as 10666, since 


2748 = 64+6-74+6-7+0-741-7. 


To indicate the base that is being used, we might write a subscript 
(in the decimal system), so that 


Of course, if the radix is larger than (10)10, 1t will be necessary to 

invent symbols to replace (10)i9, (11)10,-..,g —1. For example, 

taking g = (12)i9-and putting (10)19 = a, (11)10 = 8, we have 
(14)12 + (7)12 = GB)12 


and 
- (81)32° (a@)12 = (87)10 - (10)10 = (870)19 = (26e)12. 


PROBLEMS 


1. (a) Show that any integral weight less than 2”+! can be weighed 
using only the standard weights 1, 2, 2?,..., 2”, by putting the unknown 
weight on one pan of the balance and a suitable combination of standard 
weights on the other pan. 

(b) Prove that no other set of n + 1 weights will do this. [Hint: Name 
the weights so that wo < wi << --- < wn. Let k be the smallest index such 
that w, ~ 2* and obtain a contradiction, using the fact that the number of 
nonempty subsets of a set of n + 1 elements is 2”t? — 1.] 

2. Construct the addition and multiplication tables for the duodecimal 
digits, i.e., the digits in base twelve. Using these tables, evaluate 


(21a9)12- (8370) 12. 


PROBLEMS 13 


3. Let wi, ue, ... be the Fibonacci sequence defined in the preceding 
section. 
(a) Prove by induction (or otherwise) that for n > 0, 


Un~1 + Un—3 + Un—-5 + ee a Uny 


the sum on the left continuing so long as the subscripts are positive. 

(b) Show that every positive integer can be represented in a unique way 
in the form up, + Ung +-°* + Un, where k > 1 and n;_1 > n; + 2 for 
Fe 2) Oye ipl 


CHAPTER 2 


THE EUCLIDEAN ALGORITHM AND 
ITS CONSEQUENCES 


2-1 Divisibility. Let a be different from zero, and let b be arbi- 
trary. Then, if there is a c such that b = ac, we say that a divides b, 
and write a|b (negation: a+b). As usual, the letters involved repre- 
sent integers. 

The following statements are immediate consequences of this 
definition: 

(a) For every a ~ 0, al0 and ala. For every b, +1\b. 

(b) If alb and bc, then alc. 

(c) If alb and alc, then a|(bz + cy) for each z, y. (If alb and alc, 
than a is said to be a common divisor of b and c.) 


2-2 The Euclidean algorithm and greatest common divisor 


THEOREM 2-1. Given any two integers a, b not both zero, there 1s a 
unique integer d such that 

(a) d>0O; 

(b) dla and db; 

(c) af dy|a and d;\b, then d,|d. 
Since z|y implies that |z| < |y|, we call the d of Theorem 2-1 the 


greatest common divisor (abbreviated Gcp) of a and 6b, and write 
d = (a, b). 


Proof: First let a and 6 be positive, and assume that a > 6. Then, 
by Theorem 1-1, there are unique integers q), 7; such that 


a= ba +171, O<r; <a. 


Repeated application of this theorem shows the existence of unique 
pairs go, 72; 93,73;.-., Such that 


b = r1Q2 + 72, 0O<r2< 7, 
71 = Teq3 + 73; 0<73 < 1p, 


14 


2-2] THE EUCLIDEAN ALGORITHM AND GCD 15 


and this may be continued until we reach a remainder, say Tht1, 
which is zero; the existence of such a k is assured because rj, ro, . . . is 
a decreasing sequence of non-negative integers. Thus the process 
terminates: 


Th-3 = Th-2Qk—1 + Tk-1, O< r-1 < Tr2, 
Th—-2 = Tk-19k + Tk 0O<"n< Tk-1) 
Th—-1 = TKQk-+1- 


From the last equation we see that 7;|r,1; from the preceding 
equation, using statement (b) of Section 2-1, we see that r;|r,_2, etc. 
Finally, from the second and first equations, respectively, we have 
that 7;|b and r;|a. Thus r; is a common divisor of a and b. Now let 
d, be any common divisor of a and b. From the first equation, d,|r1; 
from the second, d;|r2; etc.; from the equation before the last, d,|rx. 
Thus we can take the d of the theorem to be rx. 

If a < b, interchange the names of a and 6b. If either a or 6 is 
negative, find the d corresponding to |a|, |b|. If ais zero, (a, b) = |b. 

If both d, and dz have the properties of the theorem, then d,, being a 
common divisor of a and b, divides dz. Similarly, dg|d,;. This clearly | 
implies that d; = de, and the Gcp is unique. 


The chain of operations indicated by the above equations is known 
as the Euclidean algorithm; as will be seen, it is the cornerstone of 
multiplicative number theory. (In general, an algorithm is a sys- 
tematic procedure which is applied repeatedly, each step depending 
on the results of the earlier steps. Other examples are the long divi- 
sion algorithm and the square root algorithm.) The Euclidean 
algorithm is actually quite practicable in numerical cases; for 
example, if we wish to find the ecp of 4147 and 10672, we have 


10672 = 4147 - 2 + 2378, 
4147 = 2378-1 + 1769, 
2378 = 1769-1-+ 609, 
1769 = 609-2-+ 551, 
609 = 551-1+ 58, 
551= 58-9+ 29, 
58 = 29-2. 
Hence (4147, 10672) = 29. 


16 EUCLIDEAN ALGORITHM AND ITS CONSEQUENCES  [cHap. 2 


It is frequently important to know whether two integers a and b 
have a common factor larger than 1. If they have not, so that 
(a, b) = 1, we say that they are relatively prime, or prime to each other. 

The following properties of the ecp are easily derived either from 
the definition or from the Euclidean algorithm. 

(a) The acp of more than two numbers, defined as that positive 
common divisor which is divisible by every common divisor, exists 
and can be found in the following way. Let there be 7m numbers 
Qi, Gg, ..., Gn, and define 


D; = (a, a2), Dz = (Dy, a3), ee ey Dy = (Dn—2, Gn). 


Then (a1, @2,...,@n) = Dn_1. 

(b) (ma, mb) = m(a, b), if m ¥ 0. 

(c) If mia and mlb, then (a/m, b/m) = (a, b)/m. 

(d) If (a, b) = d, there exist integers x, y such that ax + by = d. 
(An important consequence of this is that if a and b are relatively 
prime, there exist x, y such that ax + by = 1. Conversely, if there is 
such a representation of 1, then clearly (a, b) = 1.) 

(e) If a given integer is relatively prime to each of several others, 
it is relatively prime to their product. For if (a,b) = 1 and (a,c) = 1, 
there are xz, y, ¢, and u such that ax + by = 1 and at + cu = 1, 
whence ax + by(at + cu) = a(x + byt) + be(yu) = 1, and therefore 
(a, bc) = 1. 

The Euclidean algorithm can be used to find the z and y of property 
(d). Thus, using the numerical example above, we have 


29=551—58 - 9 (58 =609—551 - 1) 
=551—9(609—551 - 1) 
=10 - 551—9 - 609 (551 =1769—2 - 609) 
=10(1769—2 - 609) —9 - 609 
=10 - 1769—29 - 609 (609 =2878—1 - 1769) 
= 10 - 1769—29(2378—1 - 1769) 
= 39 - 1769—29 - 2378 (1769 =4147 — 2378) 
= 39 (4147 — 2378) —29 - 2378 
= 39 - 4147-68 - 2378 (2378 = 10672—2 - 4147) 


= 175 - 4147—68 - 10672, 
so that x = 175, y = —68. 


2-3) THE UNIQUE FACTORIZATION THEOREM 17 
PROBLEMS 


1. Evaluate (4655, 12075), and express the result as a linear combination 
of 4655 and 12075, that is, in the form 46552 + 12075y. 

2. Show that if (a, 6) = 1, then (a — 6,a+ 6) = 1 or 2. 

3. Show that if az + by = m, then (a, b)|m. 

4. Show that no cancellation is possible in the fraction 


ai + Qe 
bi +be 
if aybe — aod, = 1. 

5. Show that if bla and cla, and (b, c) = 1, then bela. 

6. Show that if (6, c) = 1, then (a, bc) = (a, b)(a, c). [Hint: Prove 
that each member of the last equation divides the other. Use property (d) 
above, and the preceding problem. ] 

*7, Show that if a + b ~ 0, (a, 6b) = 1, and p is an odd prime, then 


[Hint: If this acp is d, then a+ 6 = kd and (a? + b?)/(a+ b) = Id. 
Replace 6 and a + 6 in the second equation by their values from the first, 
apply the binomial theorem, and show that d|p.] 

*8. In the notation introduced in the proof of Theorem 2-1, show that 
each nonzero remainder rm, (m > 2) is less than r,,2/2. (Consider sepa- 
rately the cases in which rm_; is less than, equal to, and greater than 
Tm—2/2-) Deduce that the number of divisions in the Euclidean algorithm is 
less than 

2 log b 


= 2.88... log}, 
log 2 


where b is the larger of the two numbers whose Gcp is being found. (Here 
and elsewhere, ‘‘log’’ means the natural logarithm.) 


2-3 The Unique Factorization Theorem 


THEOREM 2-2. Every integer a > 1 can be represented as a product 
of one or more primes. 


Proof: The theorem is true for a = 2. Assume it true for 2, 3, 4, 
...,@—1. If ais prime, we are through. Otherwise a has a divisor 
different from 1 and a, and we have a = bc, with 1 <b <a, 1<c<a. 


*Here and in all problems throughout the book, an asterisk is used to 
indicate a particularly difficult problem. 


18 EUCLIDEAN ALGORITHM AND ITS CONSEQUENCES  [cuap. 2 


The induction hypothesis then implies that 


8 t 
b=IIp/, c=II pp’ 
4=1 t=] 


with p;’, p;’’ primes, and hence a = py’ po’ ... Ds pi’... Di. 

Any positive integer which is not prime and which is different from 
unity is said to be composite. Hereafter p will be used to denote a 
prime number, unless otherwise specified. 


THEOREM 2-3. If albc and (a, 6) = 1, then alc. 


Proof: If (a,b) = 1, there are integers x and y such that az + by = 1, 
or acc + bey = c. But a divides both ac and bc, and therefore 
divides c. 


THEOREM 24. If 
P| II pm, 
m=1 
then for at least one m, p = Dm. 


Proof: Suppose that p|pipe . . . pn but that p:p is different from any 
of the p1, Po, ---, Pn—1. Then p= is relatively prime to each of the 
Pi, ---, Pn—i, and so is relatively prime to their product. By 
Theorem 2-3, p|\pn, whence p = Dn. 


THEOREM 2-5 (Unique Factorization Theorem). The representa- 


tion of a > 1 as a product of primes is unique up to the order of the 
factors. 


Proof: We must show exactly the following. From 
ny nN 
a= I Pm= IT Pm’, (pi <pes ee <Pn,; Pi <pe < arias < Pn, ), 


it follows that ny = ng and pm = Pm for 1 <m< 7. 
For a = 2 the assertion is true, since n; = M2 = 1, p1 = pi’ = 2. 
Take a > 2 and assume the assertion correct for 2, 3,...,a@— 1. 
(a) If ais prime, m1 = no = 1, py = pi’ = 
(b) Otherwise n} > 1, np > 1. From 


/ is ie 4 
Pi IT Pm, D1 IT Pm 


2-3] THE UNIQUE FACTORIZATION THEOREM 19 
it follows by Theorem 2-4 that for at least one r and at least one s, 


Di =Pr, Pi=DPs- 
Since 


P< Pr=Pi SPs = 1, 

we have pi = p:’. Moreover, since 1 < p, < a and 7p,\a, we have 

a ny Ne ; 

l<—= II Pm = II Pm < a, 

P1 m =2 m=2 

and hence by the induction hypothesis, 
m—-l=n.-I1 and Dm = Dm for2<m<n. 
Theorem 2-5, which appears natural enough when one is accus- 

tomed to working only with the ordinary integers, assumes greater 


significance when we encounter more general types of “integers’’ for 
which it is not true. 


z PROBLEMS 
1. Show that if the reduced fraction a/b is a root of the equation 
cot” + cz" 14+---+ ce, = 0, 


where z is a real variable and co, ci, ..., Cn are integers with cy ~ 0, then 


alc, and b\co. In particular, show that if & is an integer then Wk is rational 
if and only if it is an integer. 

2. The Unique Factorization Theorem shows that each integer a> 1 
can be written uniquely as a product of powers of distinct primes. If the 
primes that do not divide a are included in this product with exponents 0, 
we can write 


@ 
a= II pi™, 
4=1 


where p; is the 7th prime, a; > 0 for each 7 and a; = 0 for sufficiently large 
1, and the a,’s are uniquely determined by a. Show that if also 


6 = II pi, 
4=1 
then 


(a,b) = TIT pemini fo, 
i=l 


where min (a, 8) is the smaller of a and 8. Use this to give a different 
solution of Problem 6, Section 2-2. 


20 EUCLIDEAN ALGORITHM AND ITS CONSEQUENCES  [cHap. 2 


3. Show that the Diophantine equation 
2? —y2=N 

is solvable in non-negative integers x and y if and only if N is odd or divis- 
ible by 4. Show further that the solution is unique if and only if N or 
N/4, respectively, 1s unity or a prime. [Hint: Factor the left side.] 

4. Show that the following identity is formally correct: 

°) 1 (-) l co 1 [~ ] 1 
gen Yo: = ae 


ba02 -ba0O dat 


The denominators occurring on the left are the even powers of the primes. 


2-4 The linear Diophantine equation. For simplicity, we con- 
sider only the equation in two variables 


ax + by =. (1) 


It is easy to devise a scheme for finding an infinite number of solutions 
of this equation in case any exist; it can best be explained by means 
of a numerical “example, say 52 + 22y = 18. Since x is to be an 
integer, $(18 — 22y) must also be integral. Writing 


18 — 22y 3 —Qy 
i EO 
L 5 3 y + 5 ’ 
we see that 2(3 — 2y) must also be an integer, say z. This gives 
3—2 
z= 5 Zs 2y + 5z = 3. 


We now repeat the argument, solving as before for the unknown 
which has the smaller coefficient: 


= f, z2=1 — 2¢. 


Clearly, z will be an integer for any integral ¢, and we have 
_ 3 — 5(1 — 2¢) 
= er cee 

Bees 18 — 22(—1+ 5t) | 
5 


= —-1+4 5t, 


8 — 220. 


2-4] THE LINEAR DIOPHANTINE EQUATION 21 


Moreover, it is easily seen that any solution z, y of the original equa- 
tion must be of this form, so that we have a general solution of the 
equation. 

The same idea could be applied in the general case, but it is some- 
what simpler to adopt a different approach. First of all, it should be 
noticed that (1) has no solution unless d|c, where d = (a, b), and that 
if this requirement is satisfied, we can divide through in (1) by d to 
get a new equation 

axz+by=c’, (2) 
where now (a’, b’) = 1. We now use property (d) of Section 2-2 to 
assert the existence of numbers 2, Yo’ such that 


a’ xo 4+ b’ yo = 1, 
so that c’xo’, c’yo’ is a solution of (2). Put c’z9’ = 20, cy’ = Yo. 
If ¢ is any integer, we have 
a’ (tp + b't) + O' (yo — @'t) = aay tb y=c, 
so that 2) + b’t, yo — a’t is a solution of (2) for each t. Finally, if 
21, Y1 is any solution of (2), we have 
atotby=c, axyt+by=¢, 
and, by subtraction, 
a’ (%9 — 11) +b’ (yo — y1) = O. 
Thus a’| (yo — y1), Yo - 41 = a't, and b’| (xo — 2), ty — 2 = bt. 
This gives 11 = 2% — b’te, y1 = Yo — @’t,, and, requiring that these 
numbers satisfy (2), we have fg = —t,;. Hence every solution of (2) 
is of the form 2p + b’t, yo — a’t, and every such pair constitutes a solu- 


tion. Since every solution of (1) is a solution of (2) and conversely, 
we have the following theorem. 


THEOREM 2-6. A necessary and sufficient condition that the equa- 
tion 

ax + by =c 
have a solution x, y in integers ts that d\c, where d = (a,b). If there 


as one solution, there are infinitely many; they are exactly the numbers 
of the form 


b 
r= to + Gt, Y= Yo — 


where t 1s an arbitrary integer. 


22 EUCLIDEAN ALGORITHM AND ITS CONSEQUENCES _|cHapP. 2 


There are various ways of getting a particular solution. Sometimes 
one can be found by inspection; if not, the method explained at the 
beginning of the section may be used or, what is almost the same thing, 
the Euclidean algorithm may be applied to find a solution of the 
equation which results from dividing the original equation through 
by (a,b). The latter process of successively eliminating the re- 
mainders in the Euclidean algorithm can be systematized, but this 
we shall not do at present. (See Section 9-2.) 


PROBLEMS 


1. Find a general solution of the linear Diophantine equation 
20722 + 1813y = 2849. 


2. Find all solutions of 197 + 20y = 1909 with z > 0, y > 0. 


3. Let m and n be positive integers, with m < n, and let ro, 21,..., Te 
be all the distinct numbers among the two sequences 
0 .1 m 0 1 n 
ie Ae Dio Saas and SS A ea 
mm m nn n 
arranged so that 9 < 41 << --- < 2. Describe k as a function of m and n. 


What is the shortest distance between successive 2’s? 

*4, Let a and b be positive relatively prime integers. Then for certain 
non-negative integers n (which we shall refer to briefly as the representable 
integers), the equation az + by = n has a solution with z> 0, y > 0, 
while for other n it may not have. For example, if n = 0, 3, 5, or 6, or if 
n = 8, then 3x2 + 5y = n has such a solution. Show that this example is 
typical, in the following sense: 

(a) There is always a number N(a, b) such that for all n > N(a, 5), 
nis representable. (It may be helpful to combine the theory of the present 
section with the elementary analytic geometry of the line az + by = c, 
interpreting z and y in the latter case as real variables. Note that so far it 
is only the existence of N (a, 6) that is in question, and not its size.) 

(b) The minimal value of N (a, b) is always (a — 1)(6 — 1). 

(c) Exactly half the integers up to (a — 1)(b — 1) are representable. 


2-5 The least common multiple 


THEOREM 2-7. The number (a, b) = xs has the following prop- 
erties: (a) (a,b) >0; (b) alfa, d), bla, b); (ce) If alm and b\m, 
then (a, b)\m. 


2-5] THE LEAST COMMON MULTIPLE 23 


Proof: (a) Obvious. 
(b) Since (a, b)|b, we can write 


ee 
(a, b) a la| (a, b) ? 
and hence al(a, b). Similarly, 
_ |a| 
(a, b) ot [b| ° (a, b) ? 
and so b|<a, b). 
(c) Let m = ra = sb, 


and put d = (a,b), a = a,d, b = bid. Then 
m = rad = sbhd; 


thus a;|sb;, and since (a, b;) = 1, it must be that a,|s. Thus 
Ss = at, and 


b 
m= ta,;b;d = pe 


Because of the properties listed in Theorem 2-7, the number (a, b) 
is called the least common multiple (cm) of a and b. The definition 
is easily extended to the case of more than two numbers, just as for 
the ccp. It is useful to remember that 


ab = +(a, b)<a, b). 


PROBLEMS 
1. In the notation of Problem 2, Section 2-3, show that 
(a, b) os II praxis Bi), 
t=1 
where max (a, 8) is the larger of a and 8. 
*2. Show that 
min (a, max (6, y)) = max (min (a, 8), min (a, y)). 


(By symmetry, one may suppose 8 > y.) Deduce that 
(a, (b, c)) = ((a, 6), (a, ¢)). 


CHAPTER 3 


CONGRUENCES 


3-1 Introduction. The problem of solving the Diophantine equa- 
tion az + by = cis just that of finding an z such that az and c leave 
the same remainder when divided by b, since then 6|(c — ax) and we 
can take y = (c — ax)/b. As we shall see, there are many other 
instances also in which a comparison must be made of the remainders 
after dividing each of two numbers a and 6 by a third, say m. Of 
course, if the remainders are the same, then m|(a — b), and con- 
versely, and this might seem to be an adequate notation. But, as 
Gauss noticed, the following, for most purposes, is more suggestive: 
if m|(a — b), then we write a = b (mod m), and say that a is con- 
gruent to b modulo m. 

The use of the symbol “‘=” is suggested by the similarity of the 
relation we are discussing to ordinary equality. Each of these two 
relations is an example of an equivalence relation; i.e., a relation R 
between elements of a set, such that if a and 6 are arbitrary elements, 
either a stands in the relation R to b (more briefly, a R b) or not, and 
having the following properties: 

(a) aKa. 

(b) If aR }b, then BR a. 

(c) faRbandbKe, thenaKe. 

These are called the reflexive, symmetric and transitive properties, 
respectively. That ordinary equality between numbers is an equiva- 
lence relation is obvious (or it may be taken as an axiom): either 
a=bora+b; a=a; ifa=b), thenb=a; ifa=b and b=c¢, 
then a = c. 


THEOREM 3-1. Congruence modulo a fixed number m is an equiva- 
lence relation. 


Proof: (a) m|(a — a), so that a = a (mod m). 

(b) If mi(a— 6b), then ml(6—a); if a=b (mod m), then 
b = a (mod m). 

(c) If m|(a — 6b) and m|(b — c), then a — b = km, b — c = Im, 
say,sothata —c = (k + 1)m; ifa = b (mod m) andb = c (modm), 
then a = c (mod ™). 

24 


3-2] ELEMENTARY PROPERTIES OF CONGRUENCES 20 


Since we shall have occasion later to use several other equivalence 
relations, we pause to show a simple but important property enjoyed 
by all such relations. If R is an equivalence relation with respect to 
a set S, then corresponding to each element a of S there is a subset Sq 
of S which consists of exactly those elements of S which are equiva- 
lent to a, so that b is in S, if and only ifaRb. Nowif a b, then the 
sets S, and S, are identical: if cis in Sj, then c& b, and since a R b, 
also c R a, so that cisin Sg. If, on the other hand, a is not equivalent 
to b, then S, and S, are disjoint; that is, they have no element in 
common. For if c is in Sg and in S;, then cRa and cK b, which 
entails aR b. These disjoint sets, which jointly exhaust S, are called 
equivalence classes; an element of an equivalence class is sometimes 
called a representative of the class, and a complete system of representa- 
tives is any subset of S which contains exactly one element from each 
equivalence class. 

Section 3-3 provides examples of all these notions, with somewhat 
different terminology. 


PROBLEM 


Decide whether each of the following is an equivalence relation. If it is, 
describe the equivalence classes. 

(a) Congruence of triangles. 

(b) Similarity of triangles. 

(c) The relations ‘+’, “>”, and ‘‘>”’, relating real numbers. 

(d) Parallelism of lines. 

(e) Having the same mother. 

(f) Having a parent in common. 


3-2 Elementary properties of congruences. One reason for the 
superiority of the congruence notation is that congruences can be 
combined in much the same way as can equations. 


THEOREM 3-2. If a=b(modm) and c=d(modm), then 
a+c=6-+d(modm), ac = bd(modm), and ka = kb (mod m) 
for every integer k. 


Proof: These statements follow immediately from the definition. 
For if m|(a — b) and mi(c — 4d), 
then mi(a—b+c—d) and =  mi\((a+c) — (4+ 4)). 


26 CONGRUENCES [cHar. 3 


If m|(a — b), then mlk(a — 6). Finally, if m|(a — b) and m|(c — 4d), 
then m|(a — b)(c — d). But 


(a — b)(c — d) = ac — bd + D(d — c) + d(b — a), 
so that also m|(ac — bd). 


THEOREM 3-3. If f(x) is a polynomial with integral coefficients, and 
a = b(mod™m), then f(a) = f(b) (mod m). 
Proof: Let f(z) = cot qa t--+ + cnx”. 
If a = b (mod m), then for every non-negative integer J, 
a’ = b’ (mod m), 
and c;a? = c,;b’ (mod m), 
by Theorem 3-2. Adding these last congruences forj = 0,1,...,7, 


we have the theorem. 


The situation is a little more complicated when we consider dividing 
both sides of a congruence by an integer. We cannot deduce from 
ka = kb (mod m) that a = 6 (mod m), for it may be that part of the 
divisibility of ka — kb = k(a — b) by ™m is accounted for by the 
presence of the factor k. What is clearly necessary is that the part 
of m which does not divide k should divide a — b. 


THEOREM 3-4. If ka = kb (mod m) and (k,m) = d, then 


m 
a = b (mod). 


Proof: Theorem 2-3. 


PROBLEMS 
1. Let f(z) = aoz™ + az"! +--+ + Gn, 
where do, ..., @, are integers. Show that if d consecutive values of f 


(i.e., values for consecutive integers) are all divisible by the integer d, then 
d|f(x) for all integral z. Show by an example that this sometimes happens 
with d > 1 even when (ao, ..., adn) = 1. 

2. In Theorem 3-3 take a = 10, 6 = 1, m = 9 to deduce the rule that an 
integer is divisible by 9 if and only if this is true of the sum of its digits. 
What is the corresponding rule for divisibility by 11? Use the fact that 
7-11-13 = 1001 to obtain a test for divisibility by any of the integers 
7, 11, or 18. 


3-3] RESIDUE CLASSES AND EULER’S g-FUNCTION 27 


3-3 Residue classes and Euler’s ¢-function. When dealing with 
congruences modulo a fixed integer m, the set of all integers breaks 
down into m classes, such that any two elements of the same class are 
congruent and two elements from two different classes are incon- 
gruent. For many purposes it is completely immaterial which ele- 
ment of one of these residue classes is used; for example, Theorem 3-3 
shows this to be the case when one considers the values modulo m of a 
polynomial with integral coefficients. In these cases it suffices to 
consider an arbitrary set of representatives of the various residue 
classes; that is, a set consisting of one element of each residue class. 
Such a set @), dg, ..-, Gm, called a complete residue system modulo m, 
is characterized by the following properties. 

(a) If t #7, then a; ¥ a; (mod m). 

(b) If ais any integer, there is an index 7 with 1 < 7 < m for which 
a =a; (mod m). 

Examples of complete residue systems (mod m) are the set of 
integers 0, 1, 2,..., m— 1, and the set 1, 2,...,m. The elements 
of a complete residue system need not be consecutive integers, how- 
ever; for m = 5 we could take 1, 22, 13, —6, 2500 as such a set. 


THEOREM 3-5. If a1, do,... , Amis a complete residue system (mod m) 
and (k,m) = 1, then also ka, kag, ..., kam is a complete residue 
system (mod m). 


Proof: We show directly that properties (a) and (b) above hold 
for this new set. 

(a) If ka; = ka; (mod m), then by Theorem 3+4, a; = a; (mod m), 
whence 7 = j. 

(b) Theorem 2-6 shows that if (k,m) =1, the congruence 
kx = a (mod m) has a solution for any fixed a. Let a solution be 7. 
Since a1, ..., @m is a complete residue system, there is an index 72 
such that 79 = a; (mod m). Hence kzp = ka; = a (mod m). 


The reason that we use the adjective “complete” when speaking of 
a residue system is that there is another kind which is frequently use- 
ful, called a reduced residue system. Thisis a set of integers aj, ... , Ga, 
incongruent (mod m), such that if a is any integer prime to m, there 
is an index 7, 1 <7 < h, for which a = a; (mod m). In other words, 
a reduced residue system is a set of representatives, one from each of 
the residue classes containing integers prime to m. (Clearly, 


28 CONGRUENCES [cHap. 3 


(a,m) = (b,m) if a=b(mod™m), since then m|(a — b), so that 
(a, m)|(a — b), and hence (a, m)|b; this implies that (a, m)|(b, m), 
and also, by symmetry, that (b, m)|(a,m).) The number h is the 
number of positive integers not exceeding m and prime to m. This 
function of m is customarily designated by ¢(m), and is called 
Euler’s o-function or the totient of m. 


THEOREM 3-6. If ay, ..., Qg(m) 18 a reduced residue system (mod m) 
and (k,m) = 1, then also ka,, ..., kaym) 1s a reduced residue 
system (mod ™). 


The proof is exactly parallel to that of Theorem 3-5. 
Euler’s ¢-function has many interesting properties and, as we shall 
see, it occurs repeatedly in number-theoretic investigations. 


THEOREM 3-7. If (m,n) = 1, then o(mn) = o(m)e(n). 
(A function with this property is called a multcplicative function. 
For another example, see Problem 6, Section 2-2.) 


Proof: Take integers m,n with (m,n) = 1, and consider the 
numbers of the form mz + ny. If we can so restrict the values which 
xz and y assume that these numbers form a reduced residue system 
(mod mn), there must be ¢(mn) of them. But also their number is 
then the product of the number of values which x assumes and the 
number of values which y assumes. Clearly, in order for mz + ny 
to be prime to m, it is necessary that (m, y) = 1, and likewise we 
must have (n,z) = 1. Conversely, if these last two conditions are 
satisfied, then (mz + ny, mn) = 1. Hence let x range over a reduced 
residue system (mod 7), say 21, ..-, Zy(n), and let y run over a re- 
duced residue system (mod m), say Y1,---» Yotm) If for some 
indices 2, 7, k, 1 we have 


mx; + ny; = mx, + ny, (mod mn), 
then 
m(x;— ty) +tnly; — yi) = 0 (mod mn). 


Since divisibility by mn implies divisibility by m, we have 
mx; — ze) + n(y; — yr) = 0 (mod m), 
n(y; — yr) = 0 (mod m), 
y; = yi (mod m), 


3-3] RESIDUE CLASSES AND EULER’S ¢-FUNCTION 29 


whence 7 = 1. Similarly, 7 =k. Thus the numbers mz + ny so 
formed are incongruent (mod mn). Now let a be any integer prime 
to mn; tn particular, (a, m) = 1 and (a,n) = 1. Then Theorem 2-6 
shows that there are integers X, Y (not necessarily in the chosen 
reduced residue systems) such that mX +nY =a, whence also 
mX +nY =a (mod mn). But there is an 2; such that X =z; (mod n), 
and there is a y; such that Y = y; (mod m). This means that there 
are integers k, l such that X = x; + kn, Y = y; + lm. Hence 


mX +nY =m(a;+kn) +n(y; + lm) = mz; + ny; = a (mod mn). 
Hence as x and y run over fixed reduced residue systems (mod 7) and 


(mod m) respectively, mx ++ ny runs over a reduced residue system 
(mod mn), and the proof is complete. 


1 
THEOREM 3-8. g(m) =m [I[l1—- ~) ; 
p\|m P 
where the notation indicates a product over all the distinct primes 
which divide m. 
Proof: By Theorem 3-7, if m = [I p,;“‘, 
i=1 


then 
g(m) = Il o@**). 


But we can easily evaluate ¢(p%) directly; all the positive integers 
not exceeding p® are prime to p* except the multiples of p, and there 
are just p* ! of these. Hence 


; oe a8 1 
¢ (p;~*) _— pi** ao pir’ 1 = Di 2 (1 _ ~) ) 
D: 
and so 


taal, 1) n,«.nf, 1 
o(m) = I pe#(1-) = pe TI(1 - =) 


t t=1 i=1 Pi 


1 
= mM II (1 = ~) ‘ 
p|m P 
For example, the integers 1, 5, 7, 11 are all those which do not 


exceed 12 and are prime to 12, and 
y(12) = 12(1 — #)(1 — 3) = 4. 


30 CONGRUENCES [cHap. 3 
THEOREM 3-9. >> ¢(d) =n. 
d\n 


Proof: Let d,,..., d% be the positive divisors of n. We separate 
the integers between 1 and 7 inclusive into classes C(d;), ..- , C (dx), 
putting an integer into the class C(d;) if its ecp with n is d;. The 
number of elements in C'(d;) is then 


and since every integer up to n is in exactly one of the classes, 


~~ >} Ll=n. 
din aX<n 
(a,n) =d; 
The number of integers a such that a < n and (a, n) = d; is exactly 
equal to the number of integers b such that b < n/d; and (6, n/d;) = 1; 
in fact, multiplying the b’s by d;, we get the a’s. But from the 
definition of the Euler function, the number of b’s is clearly ¢(n/d;). 


Thus 5 
Pm ° (5) in 


which is equivalent to the theorem, since, as d; runs over the divisors 
of n, n/d; also runs over these divisors, but in reverse order. 
To illustrate the theorem and its proof, take n = 12. Then 
o(1) + o(2) + o(3) + {4) + o(6) + (12) 
=14142424244=12, 
C(1) = {1,5, 7,11}, C2) = {2,10}, C3) = {8, 9}, 


C(4) = (4, 8}, C(6) = {6}, C(i2) = {12}. 
PROBLEMS 
*1. Prove that if (a, 6) = d, then 
_ de(a)p(b) - 
g(ab) = ea 


2. Show that if n > 1, then the sum of the positive integers less than n 
and prime to it is 
np(n) 


2 


[Hint: If m satisfies the conditions, so does n — m.] 


3-4] LINEAR CONGRUENCES 31 


3. Show that if dln, then y(d)|y(n). 
4. Let n be positive. Show that any solution of the equation 


g(r) = 4n+2 
fs of one of the forms p* or 2p, where p is a prime of the form 4s —1. [Hint: 


Use the factorization of y(z) as given in Theorem 3-8.] 


*5. Let f(x) be a polynomial with integral coefficients, and let (n) 
denote the number of values 


$0), f(1),.--,f(m — 1) 


which are prime to n. 
(a) Show that y is multiplicative: 


¥(mn) =¥(m)-Y(n) if (m,n) = 1. 
(b) Show that | 
¥(p*) = p*"\(p — by), 
where by is the number of integers f(0), f(1),...,f(p — 1) which are 
divisible by the prime p. 
6. How many fractions r/s are there satisfying the conditions 


(r,s) = 1, O<r<s<n? 


3-4 Linear congruences. Because of the analogy between con- 
gruences and equations, it is natural to ask about the solution of con- 
gruences involving one or more (integral) unknowns. In the case of 
an algebraic congruence f(x) = 0 (mod m), where f(z) is a poly- 
nomial in z with integral coefficients, we see by Theorem 3-3 that if 
x = ais a solution, so is every element of the residue class containing 
a. For this reason it is customary, for such congruences, to list only 
the solutions between 0 and m — 1, inclusive, with the understanding 
that any x congruent to one of those listed is also a solution. Simi- 
larly, when mention is made of the number of roots of a certain con- 
gruence, it is actually the number of residue classes that is meant. 

The simplest case to treat is the linear congruence in one unknown; 
that is, the congruence 


ax = b (mod m). 
As we have already noticed, this is equivalent to the linear Diophan- 


tine equation 
ax — my = b, 


and by Theorem 2-6 this equation is solvable if and only if (a, m)|b. 


32 CONGRUENCES [cuap. 3 


If it is solvable, and if x, Yo is a solution, then a general solution is 


m a 
L = 2X (moa =) ’ yY = Yo (moa *) : 


where d = (a, m). Among the numbers z satisfying the first of these 
congruences, the numbers 


m 2m _ (d—1)m 
2,29 Fs a9 + a zy + SI 


are incongruent (mod m™), while every other such 2 is congruent 
(mod m) to one of these. Hence we have the following theorem: 


THEOREM 3-10. A necessary and sufficient condition that the con- 
gruence 


ax = b (mod m) 


be solvable is that (a, m)|b. If this is the case, there are exactly (a, m) 
solutions (mod m). 


While Theorem 3-10 gives assurance of the existence of a solution 
under appropriate circumstances and predicts the number of such 
solutions, it says nothing about finding them. For this purpose the 
simplest procedure, if no solution can be found by inspection, is to 
convert the congruence to an equation and solve by the method given 
at the beginning of Section 2-4. 

Consider, for example, the congruence 


342 = 60 (mod 98). 


Since (34, 98) = 2 and 2|60, there are just two solutions, to be found 
from 
17x = 30 (mod 49). 


This is equivalent to 17x — 49y = 30, and we get 


_ 49y + 30 | 
ne, y 17 17 


17t — 4 


t 
OE a 52 z= 


3-4] LINEAR CONGRUENCES 33 
Take z = 0; then é = 0, y = —2,z2 = —4. Hence 
x = —4 (mod 49), 
and the two solutions of the original congruence are 
x = —4, 45 (mod 98). 


The solution of a linear congruence in more than one unknown can 
be effected by the successive solution of a (usually large) number of 
congruences in a single unknown. Consider the congruence 


Q4X1 + Aote + +--+ + Antn = c (mod m). 
The obviously necessary condition for solvability, that (aj, ...,@n, ™) 
should divide c, is also sufficient, just as in the former case. For, 
assuming it satisfied, we can divide through by (a), ..., @n, m) to get 
Q;/t, +---+an/t, =c’ (modm’), 


where now (a;’,...,@n', m’) = 1. If (a;’,...,a,-1', m’) = d’, we 
must have dn’, = c' (mod a’): 
since (a,’, d’) = 1, this has just one solution (mod a’). Thus there 
are m’/d’ numbers x, with 0 < xz, < m’ satisfying this congruence. 
Substituting these into the preceding congruence, we get m’/d’ 
congruences in m — 1 unknowns, and the process can be repeated. 
As an example, consider the congruence 
22 + 7y = 5 (mod 12). 
Here (2, 7,12) = 1. Since (2, 12) = 2, we must have 
7y = 5 (mod 2), 
which clearly gives y = 1 (mod 2), or y = 1, 3, 5, 7, 9, 11 (mod 12). 
These give 
2x = 10, 8, 6, 4, 2, 0 (mod 12) 
respectively, or 
x = 5, 4, 3, 2, 1, 0 (mod 6). 
Thus the solutions (mod 12) are 
x,y = 5,1; 11,1; 4,3; 10,3; 3,5; 9, 5; 
2,7; 8,7; 1,9; 7,9; 0,11; 6, 11. 
The general situation is given in the following theorem, which is 
easily proved by induction on the number of unknowns. 


34 CONGRUENCES [cHap. 3 
THEOREM 3-11. The congruence 
G12, +--+ + a,%, = c (mod m) 


has just dm” or no solutions (mod m) according as d\c or dte, 

where d = (a1, ..-, Gn, ™). 

Turning now to the simultaneous solution of a system of linear 
congruences, we consider the system 


unr = By (mod m1), eae Ant = Bn. (mod Mn); 
a; and @; integers. 


Clearly, no zx satisfies all these congruences unless each is solvable 
separately. Assuming that this is so, we can restrict our attention to 


systems of the form 
x =c,; (mod™,), din Xx = Cy (mod m,). 


It is clear that this system will have no solution unless every pair 
has. From the first of the congruences 


x = c; (mod m,), x =c; (mod ™m;), 
we get + = c; + my; substituting in the second yields 
| my =c; — c; (mod m;), 
and consequently it must be true that 
(m;, m;)| (cz — ¢;). 


If this is the case, then y is unique (mod m,/(m,,m;)), and x is 
unique (mod m,m;/(m;,m;)), that is, modulo the Lom of m; and mj. 
We have thus proved part of the following theorem. 


THEOREM 3-12. A necessary and sufficient condition that the system 

of congruences x =c;(modm;) ( =1,2,...,n) be solvable is 

that for every pair of indices 1, j between 1 and n inclusive, 

(m:, m;)| (cz — ¢;). 

The solution, if it exists, is unique modulo the Lom of my, ..., Mn. 

Proof: To prove the sufficiency we must show the following. If 
every pair from among the n congruences is solvable, and if any two 
of them are solved to give a single new congruence, then the n — 1 


congruences consisting of this new one and the remaining n — 2 original 
congruences also have the property that every pair from among them 


3-4] LINEAR CONGRUENCES 35 
is solvable. That is, assume that for all 7 and j with 1 <7 <n, 
1<j <7, it is true that (m,, m;)|(q; — c;), and let the solution of 
x =c; (mod ™,), Z = C2 (mod mo) 
be 
x =f (mod (my, m2)). 


Then we must show that for 3 <7 <n, 
(m, (my, Me) )| (c; = f). 


This can easily be seen by considering the exponent a of any prime p 
which occurs in the prime-power factorization of (m;, (m1, m2)). Let 
the exponent of 7 in the factorization of m,; be 8;, for j = 1, 2,..., 1. 
Then p occurs in (m, m2) with exponent max (61, Be); so that 


a = min (8;, max (61, B2)) = max (min (61, 8;), min (62, B;)). 
But our assumption is that 
panGr6)|\(e,—¢;) and  p™r® (eo — ce), 
and since p**|(c, — f) and p®?|(co — f) we see, by writing 
g—-G¢=a—-f)+ f—«), 


C. — ¢; = (2 -f) + Ff — &), 
that . 
pe™rGPdi(e,—f) and pm™m®Fil(c; — f), 


so that also p%|(c; — f). Since p* was an arbitrary prime-power 
factor of (m;, (m1, m2)), it follows that 


(mi, (m1, Me))| (ci 7 f), 


and the sufficiency of the condition is proved. 

Finally, solving the first two congruences simultaneously, we 
get a solution which is unique (mod (mj, m2)); solving this with 
the third, we get a solution unique (mod (mz, (mj, m2))), that is, 
unique (mod (m, m2, ms3)), ete. 


As a consequence of Theorem 3-12, we have the following im- 
portant result. 


THEOREM 3-13 (Chinese Remainder Theorem). Every system of 
linear congruences in which the moduli are relatively prime in pairs ts 
solvable, the solution being unique modulo the product of the modulz. 


36 CONGRUENCES 


PROBLEMS 


1. Solve the congruence 62 + 15y = 9 (mod 18). 
2. Solve simultaneously: 


x = 1 (mod 2), 
x = 1 (mod 3), 
x = 3 (mod 4), 
x = 4 (mod 5). 


3. Suppose that the system of congruences 
x =a; (mod m,), i Oe eee 2 
is to be solved, where (m;, m;) = 1 for allt, j with+1 #7. Put 
M = m,...™n, 


and fort = 1,...,n, let y = b; be a solution of the congruence 
M 
—y = 1 (mod™m,). 
Mm; 


Then show that the solution z of the original system is given by 


r= >> Ps (mod M). 
mi 


t=1 


[cHAP. 3 


4, Show that given a, b, and n, with (a, b) = 1, there is an x such that 


(ax + b,n) = 1. 


[Hint: If pla and pln, then p}(axz + b) for any x. If pln and pta, there is a 


solution of 
az + b= 1 (mod p). 


Use the Chinese Remainder Theorem.]}: 


3-5 Congruences of higher degree. We consider now the con- 


gruence 
f(z) = agx™ + ayx” 14+ ---+ a, = 0 (mod m), 


where the a; are not all congruent to zero (mod m). If m= 


2 
II De; 
j=1 


then clearly the given congruence is equivalent to the system of 


congruences 


f(z) = 0 (mod p;"), ea f(x) = 0 (mod p,**). 


3-5] CONGRUENCES OF HIGHER DEGREE 37 


If for each 7 with 1 <7 <7, c; is a root of f(z) = O (mod p:,**), 
then by the Chinese Remainder Theorem there is a solution x of 
the system 


x = c, (mod 7p"), woe x = c, (mod p,*), 


and this %, which is unique modulo ™, is a solution of the original 
congruence. Consequently, the number of solutions of the original 
congruence is the product of the numbers of roots of the congruences 
modulo the prime-power divisors of m. Hence we can restrict our 
attention to the case where the modulus is a power of a prime. 

The reduction can easily be carried a step further, so that we have 
only to consider the higher degree congruence with prime modulus, 
together with a number of linear congruences with prime moduli. 
The idea, is that the solutions of 


f(z) = 0 (mod p*) (1) 
are to be found among those of 
f(x) = 0 (mod p*) (2) 


with 8 < a. Suppose that for some 8 < aa solution of (2) is known, 
say a. (There may be others, of course.) Then every number 
a+ tp’ is a solution of (2) ; it is desired to determine ¢ so that 
a + tp’ is also a solution of 


f(x) = 0 (mod p**"). (3) 
By Taylor’s theorem, 
in? )"f™ (q 
a PTO), 


n! 


B\2e!1 
f(a + tp®) = f(a) + tp*f’(a) + Cee) + 


A term c;z’ in f(x) leads to the term 


(jg—-1)...qG—-k ; 
WISN) GIR pit — (:) ca * 


in f™ (x)/k!, so that the numbers f® (a)/k! are integers. Hence 


f(a + tp’) = f(a) + tp*f’ (a) (mod p***), 
and (8) becomes 


f(a) + tp*f’(a) = 0 (mod p**?). 


38 CONGRUENCES [cHaP. 3 


Now p*’|f(a), so that this reduces to the linear congruence 
a 
s'(a)-t= — 2D (aod p), 
of which the number of solutions is 


fo, if pif’ (a) but pt ’ 


p, if plf’(a) and ate 


1, if ptf’(@). 


The general procedure should now be clear, if all solutions of (2) 
with 8 = 1 are known. Choose one of them, say a;. Corresponding 
to it there are 0, 1, or p solutions of (2) with 6 = 2, to be found by 
solving a linear congruence. If there are no solutions, start over with 
a different a;. If there are solutions, choose one and find the corre- 
sponding solutions of (2) with 8 = 3. If all possibilities are explored 
in this way, all solutions of (1) can eventually be found. 

Consider, for example, the congruence 


f(z) = 2® — 4x? + 52 — 6 = 0 (mod 27). 
We first search for roots of 
a? — 427 + 52 — 6 = 2? 4+ 22? + 22 = O (mod 8). 


Trying successively 0, 1, 2, we find the only solution of this congruence 


to be x = 0 (mod 3). Putting z = 0 + 3¢, we now wish to find ?’s 
for which 


f(0 + 3¢) = 0 (mod 9). 
As above, this reduces to 
3f'(0)t = —f(0) (mod 9), 
or 
15¢ = 6 (mod 9), 
or 
5t = 2 (mod 3), 
so that ¢=1(mod3). Putting ¢ = 1+ 34, we get x = 3 + 94, 
and we ask that 
f(8 + 96) = 0 (mod 27). 


3-6] CONGRUENCES WITH PRIME MODULI 39 


This gives 


f(3) + 94f'(3) = 0 (mod 27), 


or 
9-8-t, = 0 (mod 27), 


t; = 0 (mod 8). 


Thus é; = 3t2 and x = 3 + 27te, so that the only solution of the origi- 
nal congruence is 


x = 8 (mod 27). 


If at any stage in the above argument there had been more than 
one possibility, each of them would have had to be followed through 
to obtain corresponding solutions. 


PROBLEMS 


1. Find all solutions of the congruence 
xz* — 3r* + 27 = 0 (mod 1125). 
[Answer: x = 51, 426, 801 (mod 1125).] 

2. If f(z) is a nonconstant polynomial with integral coefficients, show 
that it assumes composite values for arbitrarily large x. [Hint: Apply 
Taylor’s theorem to f(m + k-f(m)).] 

3. Suppose that the congruence f(z) = 0 (mod =p) has as roots the s 
numbers 21,..., 2s, which are distinct (mod p). Show that if ptf’ (x) 


fork = 1,..., 8, then the congruence f(z) = 0 (mod p%) also has exactly 
s roots, for every a > 1. 


3-6 Congruences with prime moduli. If f(z) and f,(r) are two 
polynomials whose corresponding (integral) coefficients are congruent 
modulo m, then we say that f(x) and fi(x) are identically congruent 
modulo m, and write 


F(x) = fie) (mod m). (4) 


When there is no reference made to the numerical values of x in such 
a relation, it will always mean identical congruence. It should be 
noted that (4) is not equivalent to the assertion 


J (x) = fi(x) (mod m) for all z, 


since, for example, z* = x (mod 3) for all x, but z* and x are not 
identically congruent modulo 3. 


40 CONGRUENCES [cnap. 3 


If g(z) 1S also @ polynomial with integral coefficients, and g(x) has 
leading coefficient 1, then f(z) can be divided by g(x) in the usual 
fashion to obtain a quotient q,(xz) and a remainder r,(z). Both 


qi and 7; are polynomials with integral coefficients, and the degree 
of r; is less than that of g. If now 


Qi(z) = q(x) (mod m) and ri (x) = r(x) (mod m), 
then 
f(x) = g(z)q(x) + r(x) (mod m). (5) 
Such diviston modulo m is not always possible if the leading coefficient 
of g(x) is not 1, since fractional coefficients may then be encountered. 
In the case of a prime modulus, however, it is possible to find an 


integer c such that cg(z) has leading coefficient congruent to 1, and 
so to carry out the division. 


If in (5), r(x) = 0 (mod m), 


then g(z) is said to divide f(x) modulo m, or to be a factor of f(x) 
modulo m, and we write 


g(x)|f(z) (mod m). 
If f(z) has no nonconstant factor (mod m) of lower degree than 
itself, it is said to be irreducible (mod m). If (x — a)|f(x) (mod m), 
then a is said to be a zero of f(x) (mod m), or a root of the congruence 
f(z) = 0 (mod m). 
In the case of prime modulus, the Euclidean algorithm can easily 


be generalized, so that we can find the ccp (mod p:) of any two poly- 
nomials. For example, if 


f(z) = 2° + 22? —2 +41, g(t) = 2? —a2 +1, 
then 


f(z) =xz- g(x) + @ + 1) (mod 8), 
g(x) = (@ + 1)@ + 1) (mod 3), 


and so the Gcp (mod 8) of f(x) and g(x) is the last nonvanishing 
remainder, namely x +1. But 


f(x) = (c + 3)g(x) + (@ — 2) (mod 5), 
g(x) = (tc + 1)(z — 2) + 8 (mod 5), 
z — 2 =3(2r + 1) (mod 5), 


3-6) CONGRUENCES WITH PRIME MODULI 4l 


so that f(x) and g(x) are relatively prime (i.e., have no common 
nonconstant divisor) modulo 5. 

If the leading coefficient of g(x) is not 1, it may be made so by 
multiplication by a suitable constant, and then one can find 
(f(x), cg(x)). 

It is now possible to prove theorems analogous to Theorems 2-1 
through 2-5, and so to show that every polynomial is congruent to a 
product of polynomials which are irreducible (mod p:p), and that this 
representation is unique except for the order of factors and the 
presence of a set of constant factors whose product is 1 (mod p). 


Notice that this result is not valid when the modulus is composite, 
for example, 


(x — 1)z = (x — 3) (x + 2) (mod 6), 


and each of the linear polynomials is of course irreducible. 
Another assertion which holds only for prime modulus is that if 


f(x)g(z) = 0 (mod p), 
then either 


f(z)=0 or g(x) =0 (mod >). 
For otherwise we may suppose, with no loss in generality, that the 


leading coefficients of f(x) and g(x) are 1. But then the leading co- 
efficient of f(x) - g(x) is also 1, and therefore not 0. 


THroreM 3-14 (Factor theorem). If a is a root of the congruence 


f(z) = 0 (mod m), 
then 


(x — a)|f(z) (mod m), 


and conversely. 


Proof: Take g(x) = x —a in equation (5). Then r(x) =r is 
constant, and 


f(x) = (@ — a)q(z) +7 (mod m). 


Putting + = a, we see that r = 0 (mod™m) if F(a) = 0 (mod m). 
Conversely, if 


f(x) = ( — a)g(x) (mod m), 
then 


f(a) = 0 (mod m). 


42 CONGRUENCES [cuap. 3 
THEOREM 3-15 (Lagrange’s theorem). The congruence 


f(x) = 0 (mod p) 
an which 


fe) = agz” + °--> +a, ao ¥ 0 (mod p), 
has at most n roots. 


Proof: For n = 1 this follows from Theorem 3-10. Assume that 
every congruence of degree n — 1 has at most n — 1 solutions, and 
that a is a root of the original congruence. Then 


f(z) = @& — a)q(x) (mod p), 


where q(x) is not identically zero (mod p) and is of degree n — 1. 
It therefore has at most n — 1 zeros, say c1,...,¢-, wherer <n — 1. 
Then if c is any number such that f(c) = 0 (mod p), then 


(c — a)g(c) = 0 (mod p), 
so that either 
c = a (mod p) 
or 


q(c) = 0 (mod p), that is, c = c; forsome7z, 1 <i<r. 
In other words, the original congruence has at most r + 1 < n roots. 
The theorem now follows by the induction principle. 
Again, this theorem is not valid for composite modulus. 
PROBLEM 


Let f(z) be a polynomial of degree n, with integral coefficients. Show 
that if n + 1 consecutive values of f(x) are divisible by a fixed prime p, then 
p|f (x) for every integral x. Cf. Problem 1, Section 3-2. 


3-7 The theorems of Fermat, Euler, and Wilson 
THEOREM 3-16 (Fermat’s theorem). If pta, then 
a?! = 1 (mod p). 
Since o(p) = p — 1, this is a special case of 
THEOREM 3-17 (Euler’s theorem). If (a, m) = 1, then 


a?) = 1 (mod m). 


3-7] THE THEOREMS OF FERMAT, EULER, AND WILSON 43 


Proof: Let ci, ..., Cocm) be a reduced residue system (mod m), 
and let a be prime to m. Then ac, ..., @Cg(m) 18 also a reduced 
residue system (mod m), and 


e(m) e(m) e(m) 
II ac; = a?™ J] c; = IT c; (mod ™). 
i=] i=1 i=l 


Since (m, [Ic;) = 1, this implies that 
a?) = 1 (mod m). 


We see from Euler’s theorem that if we take the least positive 
remainders (mod m) of the sequence of powers a, a”, a®, ...of a 
number a which is prime to m, we will have a periodic sequence, of 
period less than or equal to g(m). The period of this sequence—that 
is, the least positive exponent ¢ such that a’ = 1 (mod m)—is called 
the order of a (mod m), or the exponent to which a belongs modulo m, 
and we write ord, a = ¢. 


THEOREM 3-18. If a” = 1(mod m), then ordm alu. 
Proof: Put ord,na = t, andletu=qt+r,0<r<it. Then 
a” = gt — (q')4- a7 =a" = 1 (mod™m), 


and if r were different from zero, there would be a contradiction with 
the definition of ¢. 


THEOREM 3-19. For every a prime to m, ordm alg(m). 
Proof: Follows immediately from Theorems 3-16 and 3-18. 


As we shall see in the next chapter, the numbers a of order ¢(m) are 
of great importance. 

The direct converse of Fermat’s theorem does not hold; that is, it 
is not true that if for some a, a”~' = 1 (mod m), then m is prime. 
For example, the powers of 3, reduced modulo 91, are 3, 9, 27, 81, 61, 1, 
so that ordg,; 3 = 6. Since 6|90, 39° = 1 (mod 91). But 91 is not 
prime. The clue to the proper converse lies in the observation that 
g(m) < m — lLalways, and ¢(m) = m — 1 if and only if m is prime, so 
that m will certainly be prime if there is an a such that ord, a=m—l1. 


THEOREM 3-20. If there isana for which a”! = 1 (mod m), while 
none of the congruences a\—)!? = 1 (mod m) holds, where p runs 
over the prime divisors of m — 1, then m is prime. 


44 CONGRUENCES [caap. 3 


Proof: By the first hypothesis and Theorem 3-18, the exponent ¢ 
to which a belongs (mod m) divides m — 1. On the other hand, since 
every proper divisor of 2 — 1 is a divisor of at least one of the num- 
bers (m — 1)/p, the second hypothesis and Theorem 3-18 imply 
that ¢ is not a proper divisor of m — 1. Consequently ¢ = m — 1. 
By Theorem 3-19, m — lly(m), and so m — 1 = g(m) and m is 
prime. | 


In a way, Theorem 3-20 is simply a restatement of the fact that 
p(m) = m — 1 if and only if m is prime. But in distinction to this 
statement, it can actually be used to investigate the primality of 
large numbers. 


Fermat’s theorem exhibits congruences which have the maximum 
number of roots allowable by Lagrange’s theorem. The following 
theorem gives another important example of such a situation. 


THEOREM 3-21. If pis prime and d divides p — 1, then there are 
exactly d roots of the congruence 


xt = | (mod p). 
Proof: Since d\p — 1, 
g?-) — 1 = (24 — 1)q(zx) (mod p), 


where q(x) is a polynomial of degree p — 1— din zx. By Lagrange’s 
theorem, the congruence 


q(x) = 0 (mod p) 


has at most p — 1 — dsolutions. Since x?! = 1 (mod p) has exactly 
p—1«olutions, z?=1 (mod p) must have at least p—1—(p—1—d)=d 
solutions. Since it can have no more than this, it must have exactly 
d solutions. 


As another consequence of Fermat’s theorem, we have 
THEOREM 3-22 (Wilson’s theorem). If p zs prime, then 
(p — 1)! = —1 (mod p). 
Proof: Fermat’s theorem and Theorem 3-14 show that 
zP* — 1 = (x — 1)(@ — 2)--- @ — p +1) (mod p) 
identically, so that the constant terms must be congruent: 
—1 = (—1)?"*(p — 1)! (mod p). 


3-7] THE THEOREMS OF FERMAT, EULER, AND WILSON 45 


If p is odd, this gives the theorem. If p = 2, then we have 
—l] =1 = 1! (mod 2). 
The converse of Wilson’s theorem does hold. 


THEOREM 3-23. If m > 1 and (m — 1)! = —1 (mod™m), then m 

as prime. 

Proof: If m is composite, it has a proper divisor d > 1. But then 

(m — 1)! =0# —1 (modd), 
and a fortiorz, 
(m — 1)! ¥ —1 (mod ™m). 

There is another way of obtaining Wilson’s theorem which also 
throws some light on a subject to be considered in much more detail 
in Chapter 5. Let a be any integer not divisible by the odd prime p, 
and let b be one of the numbers 1, ..., p — 1. Then we know that 
there is a unique solution (mod p=) of the congruence bz = a (mod p). 
Let b’, called the associate of b, be that positive solution which is less 
than p. We must distinguish two cases, according as some 6 is asso- 
ciated with itself or not. If b = b’, then b? = a (mod p), so that the 
congruence x” = a (mod :) has a solution; in this case a is said to be 
a quadratic residue of p. If the congruence x” = a (mod =p) has no 
solution, a is called a quadratic nonresidue of p. (Similar definitions 
hold for nth power residues and nonresidues. ) 

If a is a quadratic residue of p, and if b;” = a (mod p), then clearly 
also (p — b;)? =a (mod p); by Lagrange’s theorem there are no 
other solutions. Thus in this case the numbers 1, ..., p — 1 can be 
grouped into (p — 3)/2 pairs of associates, the product of each pair 
being congruent to a (mod p), together with the two numbers b, 
and p — b,. Thus 


p-—l 
(p—1)! = IL b =a?” . bi (p — by) = —a?-Y? (mod p). (6) 
b=1 


On the other hand, if a is a quadratic nonresidue of p, the numbers 


1,2,...,p — 1 canbe grouped into (p — 1)/2 pairs of associates, and 
p-—l 
(»p—1)!= I] b =a? (mod p). (7) 


In order to give a uniform statement of (6) and (7), we define the 


Legendre symbol (a/p) (also frequently written (<) or (alp)) to 


46 CONGRUENCES [cHaP. 3 


mean 1 if a is a quadratic residue of p, and —1 if a is a quadratic 
nonresidue of p. Herea is called the “first entry,” and p the “second 
entry.” (Note that (a/p) is not yet defined if pla.) Then (6) and (7) 


oon (p — 1)! = —(a/p)a?—Y (mod p). (8) 
Taking a = 1, and noting that the congruence x* = 1 (mod :p) has 
the solution x=1, so that (1/p)=1, we have (p—1)!=—1 (mod p), 
which is Wilson’s theorem again. Substituting in (8), this gives 
(a/p)a?—Y? = 1 (mod p), 
or since (a/p) = +1, 
(a/p) = a? ? (mod p). 
Thus we have proved 
THEOREM 3-24 (Euler’s criterion). A necessary and sufficient 
condition that a be a quadratic residue of an odd prime p 1s that the 


congruence g(P-D/2 = 1 (mod p) 
hold. 


PROBLEMS 
1. Show that if ab = 1 (mod m), then 
ordn a = ord, b. 
2. Show that if p is an odd prime and ord,a a = 2¢, then 
a’ = —1 (mod p2). 


Show that this need not be true if p = 2. 
3. Show that if p is an odd prime and a‘ = —1 (mod p), then a belongs to 
an even exponent 2u (mod p), and ¢ is an odd multiple of wu. 

*4. Show that if p is an odd prime and p| (x? + 1), then p = 1 (mod 2’¥?). 
Deduce that there are infinitely many primes congruent to 1 modulo any 
fixed power of 2. . 

5. Show that for a> 1 and n> 0, nlg(a” — 1). 
6. Use Theorem 3-20 with a = 2 to show that 389 is prime. 

*7. Show that if (a,b) = 1, p is an odd prime not dividing a + 6, and 
aP + 6? 

? 

a+b 

then d = 1 (mod p). Cf. Problem 7, Section 2-2. [Hint: Let q be a prime 


d 


3-7] THE THEOREMS OF FERMAT, EULER, AND WILSON 47 


divisor of (a? + b?)/(a + 5), so that a? = —b? (mod q). Show that ak 
exists such that b|(kq +a), and put r = (kq+a)/b; then r? = —1 
(mod q), so that ord, (—7) = lor p. If the first alternative is eliminated, 
then p|(q — 1).] 

8. Show that the congruence f(z) = 0 (mod p), of degree m < p, has 
m roots if and only if f(x)|(z? — x) (mod p:). 

9. Use Theorem 3-21 and the method of Section 3-5 to show that if p 
is prime and d|p — 1, then there are exactly d roots (mod p”) of the con- 
gruence 

z* = 1 (mod p”), 
where n > 1. 
*10. Show that the Diophantine equation 


(n—1)!=nF-1 


has only the solutions n, k = 2,1; 3,1; and 5,2. [Hint: Prove and use 
the following statements: 

(a) There is no solution with n even and larger than 2. 

(b) » — 1|(m — 2)! if n is odd and larger than 5. 

(c) (n — 1)*|(n* — 1) only if (n — 1)|k. It is useful to write n* — 1 
= ((n-—1)4+1)F-1)] 


CHAPTER 4 


PRIMITIVE ROOTS AND INDICES 


4-1 Integers belonging to a given exponent (mod /) 

THEOREM 4-1. If ordna = t, then ord, a” = t/(n, t). 

Proof: Let (n,#) = d. Then, since a’ = 1 (mod m), we have 
(a‘)"/? = (a”)*/4 = 1 (mod m), 


so that if ord, a” = t’, then 


ae i 
t | F (1) 
But from the congruence 
(a”)* = 1 (mod m), 
we have that ¢|nt’, or 
tin, 
F 7° ; 
Since 
tn 
G3) = 
this gives 
tly 
—|¢. 2 
, 0 


THEOREM 4-2. If any integer belongs to t (mod p), then exactly 
y(t) incongruent numbers belong to t (mod p). 


Proof: Assume that ord, a = t. Then by Theorem 3-19, ¢|(p — 1), 
so that by Theorem 3-21 there are exactly ¢ roots of the congruence 
48 


4-1] INTEGERS HAVING A GIVEN ORDER (MOD 7) 49 


a‘ = 1 (mod p). But all the numbers a, a”, ..., a are roots of this 
congruence and they are incongruent (mod p:p), so that they are the 
only roots. By Theorem 4-1, the powers of a which belong to 
é (mod p) are the numbers a” with (n,t) = 1,1 <n <4, and there 
are just y(t) of these numbers. 


THEOREM 4-3. If t|(p — 1), there are y(t) incongruent numbers 
(mod p) which belong to t (mod p). 


Proof: Let d run over the divisors of p — 1, and for each such d let 


¥(d) be the number of integers among 1, 2, ..., p — 1 of order 
d (mod p). By Theorem 3-19 and Fermat’s theorem, each of the 
integers 1, 2,..., p — 1 belongs to exactly one of the d. Hence 
L vd) =p-—1. 
d\jp—1 
But also 
pe g(d) = ps 1, 
djp—1 


by Theorem 3-9, so that 
a vd) = DY ¢@). 
djp—1 


d\jp—1 
By Theorem 4-2, the value of ¥(d) is either zero or ¢(d) for each d, 
and we deduce from the last equation that ¥(d) = ¢(d) for each d 
dividing p — 1. 

If ord, @ = v(m), then a is said to be a primitive root of m. The 
importance of this notion lies in the fact that if g is such a primitive 
root, then its powers 

g; g”, ee, gr™) 


are distinct (mod m) and are all relatively prime to m; they therefore 
constitute a reduced residue system modulo m. Thus we have a con- 
venient way of representing all the elements of a reduced residue sys- 
tem, some of the implications of which are to be found later in this 
chapter and in the problems. 

It follows immediately from Theorem 4—1 that the other primitive 
roots of m are those powers g* for which (k, ¢(m)) = 1. Either from 
this remark or from Theorem 4-3 we have 


THEOREM 4-4. Thereare exactly e(¢(p)) primitive roots of a prime p. 


50 PRIMITIVE ROOTS AND INDICES [cHap. 4 


PROBLEMS 


1. Show that if ord, a = ¢, ord, 6b = u, and (¢, u) = 1, then ord, (ab) = tu. 

2. Show that if p = 1 (mod 4) and g is a primitive root of p, then so is 
—g. Show by a numerical example that this need not be the case 
if p = 3 (mod 4). 

3. Show that if p is of the form 2” + 1 and (a/p) = —1, then aisa 
primitive root of p. 

4, Show that if p is an odd prime and ord, a = ¢> 1, then 


t—1 
>, a* = —1 (mod p). 
k=1 


4-2 Primitive roots of composite moduli. Theorem 4—4 imme- 
diately brings the following questions to mind: Do all numbers have 
primitive roots? If not, which do, and how many are there? The 
first question is easily answered in the negative, since 8 has none: 
¢(8) = 4, but 


ordg 1 = 1, ords 3 = 2, ordg 5 = 2, ordg 7 = 2. 


On the other hand, since 5 is a primitive root of 6, there are com- 
posite numbers having primitive roots. The answer to the second 
question is, therefore, not just the set of primes, as one might think. 
After the primes themselves, the simplest moduli to treat are the 
prime powers. We need a preliminary result. 
THEOREM 4-5. (a) If p is prime, then 
a = b (mod p”) implies a?’ = b?* (mod p”*®) (3) 
for every pair of positive integers n, s. 
(b) If p is an odd prime and ptb, then 
a?’ = 6b?’ (mod p"***) implies a =b (mod 7p”) (4) 


for every pair of positive integers n, s. 


Proof: (a) We use induction ons. Assume that a=b (mod p”). 
Then 
a = hp” + 8B, 
and 


a? = (hp”)? + (?) (hp”)?—b + ae + ( . :) hp"b?—! +. b?. 


4-2] PRIMITIVE ROOTS OF COMPOSITE MODULI 51 


Now p occurs in the numerator of the binomial coefficient 


ae 


but it is not present in the denominator if 0 < k < p; hence p 


P\ n(p—k) 
(j)? : 


But also p**!|p"”, so that a? = b? (mod p”*!). Hence (3) is correct 
for s = 1 and every n. 

Suppose that (3) is valid for s = 1, 2,..., 8’, for every n, and 
suppose that a = b (mod p”). (This congruence is now to be regarded 
as the premise of (3) with s = s’ + 1.) Then the induction hypothe- 
sis with s = 1 gives 


( ) 
k 
for 0 < k < p, and for such k, 


prt 


a? = b? (mod p**?). (5) 
Using (5) as the premise of (3) with s = s’ gives 


(a?)?” = (b?)?” (mod p*t3**’), 
or 
a? = 5?" (mod ptte't)) 


which is the conclusion of (3) with s = s’ + 1. Hence (8) holds for 
every pair of positive integers n, s. 

(b) We first prove (4) for s = 1, by induction on n; we suppose 
throughout that p ~ 2 and ptb. Thus we wish to show that 


a? = b? (mod p**!) implies a = b (mod p”). 


If a? = b? (mod p’), then also a? = b? (mod :p), and, by Fermat’s 
theorem, a = b (mod p). Now assume that 


a? = b? (mod p” ) implies a = b (mod p”~"*) 


and that 
a? = b? (mod p”*"). 
Then 
a? = b? (mod p” ), 
so that 
a = b (mod p”~*). 


52 PRIMITIVE ROOTS AND INDICES [cHap. 4 
But if a = up” + b, then 
P = 6? + up™’b?! (mod p”’t") 
if p > 2, and so plu, whence a = up” +b and 
a = b (mod pp”), 

and the implication follows by induction on n. 

To complete the proof of (4), we use induction on s. Assume that 

a?” * = bo?" (mod p***’1) ~—s implies = a = b (mod 9”) 
for every n, and assume that 
a?” = bP” (mod p”**’). 

Then 


(a?)?"* = (b?)?"* (mod p***’), 
whence 
, P = bP (mod prtt , 


so that, by what we have just proved, 
a = b (mod p”). 
The result follows by induction on s. 


Let p be a prime. Then if p"|a and p**’}a, we will write for 
brevity p"||a. 


TuroreM 4-6. If pis an odd prime, ord, a = t, and p*|| (a* — 1), 
then 
ordpn@ = t+ pmexin—2), 


Proof: Assume the hypotheses of the theorem are satisfied. If 
n <2z, then p”| (at —1). This is not true for any exponent t’ < t, 
since if p"|(a"’ — 1), then p|(a” — 1), so that ¢|t’. Hence in this case 
ord,» a = t, which proves the theorem for n < z. 

If n > z, we get from Theorem 4-5 and the last hypothesis of the 
present theorem that 

a'P™ * = 1 (mod p”). 
We must show that a? # 1 (mod p”) if d is a proper divisor of tp” 
Let d=t,p", where r<n—2 and t;|t, and assume that a4?’ =1 (mod p”). 


pony 
? 


By Theorem 4—5 again, at = 1 (mod p 


4-2] PRIMITIVE ROOTS OF COMPOSITE MODULI 53 
whence a“ = 1 (mod p), 


so that é\t;, and ¢ = 4. Since p*|| (a’ — 1) and p*”|(a* — 1), we have 
nm —r <2, whence n — z =r. 
We can use Theorem 4-6 to construct primitive roots of p”, where 
p is an odd prime; that is, numbers which belong to p”~'(p — 1) mod- 
ulo p”. Let g be a primitive root of p. Then if p*t(g?—' — 1), 
Theorem 4-6 shows that 
n—l 


ordp2g = (p — 1)p™™, 


and g is also a primitive root of p” for all positive n. If p*|(g?—? — 1), 
then g + p is also a primitive root of p, and 


(g + p)?* —1= g? 1+ (p—1)g? *p — 1 
= (p — 1)pg?* # 0 (mod p’), 
so that by Theorem 4-4, 
ordp» (g + p) = (p — 1)p™", 


and g + 7: is a primitive root of p” for all positive n. We have thus 
proved 


THEOREM 4-7. Any power of an odd prime has a primitive root. 


Turning now to other composite numbers, it is convenient to define 
a function A(m), called the universal exponent of m: 


A(1) = 1, 


a thee tery 
A(p*) = o(p%), p an odd prime, 
A(2* = py“)... Pe“T) = (A(2*), A(p1%), - --  A(Dr*")), 
Pi,---, Pr distinct odd primes. 
Euler’s theorem can now be strengthened somewhat. 
THEOREM 4-8. If (a,m) = 1, then 
a = 1 (mod m). 


Proof: (a) If m = 2% with a < 2, this is Euler’s theorem. 
(b) If m = 2* with a > 2, a must be odd, so that a? = 1 (mod 23). 


54 PRIMITIVE ROOTS AND INDICES [cHap. 4 


By Theorem 4-5, (a?)2*~ = a2*” = 1 (mod 22). 
(c) If m = p%, where p= is odd, we have Euler’s theorem again. 
(d) Finally, suppose that m = 2% - pi“ ---p,%". By (a), (b) and 
(c), each of the congruences 


a*2*) = 1 (mod 2°), 
qh(P™) = 1 (mod p;**) t= 12. acy; 
holds. Since all the exponents of a divide \(m), it follows that 
a) = 1 (mod 2°), 


a) = 1 (mod p;‘), i=1,2,...,7, 
and hence 


a’) = | (mod m). 
As a complement to Theorem 4-8, we have 


THEOREM 4-9. A(m) is the smallest positive value of x such that 
a” = 1 (mod m) for every a prime tom. That is, there is always an 
anteger which belongs to \(m) (mod m). 


Proof: (a) If m = 1: A(1) = 1 and ord; 1 = 1. 

(b) If m = 2: X\(2) = 1 and ordg 1 = 1. 

(c) Ifm = 4: (4) = 2 and ord, 3 = 2. 

(d) If m = 2%, a > 2: \(2%) = 2%? and ordoe 5 = 2%*. For if 
ord 5 = d, then d|2“~*, so that d = 2°, where 8 <a — 2. But it is 
easily proved by induction on a that for a > 3, 


527 — 1 4+ 22-1p, 


where ha isan odd number. Hence 52~ 4 1 (mod 2%) and B = a — 2. 

(e) If m = p* with p odd: A(p*) = ¢(p%), and by Theorem 4-7, 
p* has a primitive root. 

(f) If m is arbitrary: Let m = p\™---p,*, with 2< pi <--: 
<p,. By the first five steps of the proof, there are numbers aj, . . . , a; 
such that ord,,«;a; = \(p;**) fort = 1,..., r. By the Chinese Re- 
mainder Theorem, there is a single integer a such that a=a; (mod p;**), 
forz = 1,...,7, and the order of a (mod p;%) is the same as that 
of a;, for each 7. Hence if a = 1 (mod m), then A(p,**)|z for each 7, 
and so \(m), since it is the tcm of the numbers A (p;%*), also divides z. 
By Theorem 4-8, ordm a = A(m). 


4-2] PRIMITIVE ROOTS OF COMPOSITE MODULI a9) 


An integer whose order (mod m) is A(m) is called a primitive 
r-root of m. Theorem 4-9 says in effect that every modulus has a 
primitive d-root. 

As a combination of Theorems 4-2 and 4-9, we have 


THEOREM 4-10. There are o(A(m)) primitive \-roots of m congruent 
to powers of any given primitive d-root. 


Notice that in general it is not the case that all the primitive 
\-roots are congruent to powers of a single one. For example, if 
m = 24, g = 5, then the only other primitive \-root congruent to a 
power of 5 is 13, while 3 and 11 are also primitive )-roots. 

Moreover, we can now deduce 


THEOREM 4-11. The numbers having primitive roots are 
1, 2, 4, p*, 2p", 
where p is any odd prime. 


Proof: We already know that 1, 2, 4, and p* have primitive roots. 
Since 


A(2p*) = (A(2), A(p*)) = A(p*) = o(p*) = v(2p%), 


every number 2p* has primitive roots. On the other hand, if 
m = 2%-p,%--- 9," with a > 2, p; odd, r => 1, then 


A(m) < 30(2*)e(pi™) - + + e(pr*") < ge(m), 
and if 
m= pi--- p,%7 with r > 1 
or if 
m = 4p," --- p,%r with r > 1, 


then each of the numbers )\(4), \(p;“*) is even, so that again 
A(m) < ge(m). 
This completes the proof. 


The problem of efficiently finding a primitive root of a given large 
modulus g is not simple. It is, of course, a finite problem, and for 
specific modulus can be solved by successively testing the elements of a 
reduced residue system. A slightly more rapid method is indicated 
in Problem 4 at the end of the next section, but it also is laborious for 
large g, particularly if g(q) has many distinct prime divisors. 


56 PRIMITIVE ROOTS AND INDICES [cHap. 4 


PROBLEMS 


1. Show that if g has primitive roots, there are o(¢(q)) of them, and their 
product is congruent to 1 (modq) if g> 6. [Hint: Represent all the 
primitive roots in terms of a single one.] 

2. Find all the primitive roots of 25. 

*3. It is an unproved conjecture that no two consecutive integers, except 
8 and 9, are perfect powers. Show that at any rate the only pair z, y satis- 
fying the conditions 

37 — 2¥ = I, z>I1,y>I1 


is 2,3. [Hint: Use Theorem 4-6 and Problem 3, Section 3-7 to show that 
37-"y.] 
4. Show that if gis a primitive root of p?, then the roots of the congruence 
2?—-l = 1 (mod p?) 
are g*?, n = 1, 2,..., p —1; that is, that these numbers are distinct 


roots, and there are no others. [Hint: Show that the congruence has only 
p — lroots. Cf. Problem 9, Section 3-7.] 


4-3 Indices. Let g be a number having primitive roots and let g 
be one of them. Then the numbers g, g’, ---, g®™ are distinct 
(mod q), and they are all prime to gq; therefore they constitute a 
reduced residue system (modgq). The relation between a number a 
and the exponent of a power of g which is congruent to a (mod q) is 
very similar to the relation between an ordinary positive real number 
z and its logarithm. This exponent is called an zndez of a to the base g, 
and written “ind, a.’ That is, ind, a will stand for any number ¢ 
such that g’ = a (mod q); it is defined only if (a, ¢) = 1, and is unique 
modulo y(q). The following facts are immediate consequences of the 
definition. 


THEOREM 4-12. If g is a primitive root of q and a = b (mod q), 
then 


ind, b (mod ¢(q)), 
ind, a + ind, b (mod ¢(q)), 


ind, a 
ind, (ab) 


and 
ind, a” = n indga (mod ¢(q)). 


The procedure for finding the indices of the elements of a reduced 
residue system is quite simple if a primitive root is known. If g isa 
primitive root of g, construct a table of two rows and ¢(q) columns, of 


4-3] INDICES 57 


which the second row consists of the integers 1, 2, .. . , e(q) in order. 
In the first row enter g in the first column. Multiply this by g and 
reduce modulo q for the element in the second column, multiply this 
result by g and reduce modulo gq for the element in the third column, 
etc. (When the table is complete, the last element in the first row 
should be 1.) Then the index of any element of the first row appears 
directly below that element. 
If, for example, gq = 17 and g = 3, we have the table 


a:|3}9}10/138}5|15}11)/16}14] 8} 7} 4/12] 2) 6) 1 
inda:|1/2| 3] 4/5] 6| 7] 8| 9/1011 ]i2]i3} 14/15} 16 
while if g = 18 and g = 5, we have 

a:{5|7{}17)13/11)1 
ind a: aie] 3| 4| 5le_ 
By Theorem 4~1, if ordm g = ¢(m), then 
ordn g” = LA 3 
(n, y(m)) 


so that a is a primitive root of m if and only if (inda, g(m)) = 1. 
Thus in the above table we see that the primitive roots of 18 are 5 and 
11, since the only numbers less than ¢(18) = 6 and prime to it are 
1 and 5. 

Indices are quite useful in solving binomial congruences. For 
example, the congruence 


10z = 8 (mod 18) 
implies 
5a = 4 (mod 9), 
which implies 
ind 5 + ind x = ind 4 (mod 6), 
ind x = ind 4 — ind 5 (mod 6). 
Since 2 is a primitive root of 9, we construct the table as before: 


2/4/8/7/5)1 


1);2);3/4/5|6 


58 PRIMITIVE ROOTS AND INDICES [cHap. 4 


Thus indz = 2 — 5 = 3 (mod 6), 
whence x = 8 (mod 9), 
so that x = 8or17 (mod 18). 


The investigation of the congruence z® = c(modm), where 
(m,c) = 1, can be reduced to the study of the solutions of 


x” =c (mod p) 
by previously explained methods. But the latter is entirely equiva- 


lent to 
n-ind xz = indc (mod p — 1), 


which has solutions if and only if (n, p — 1)|ind c; if this condition 
is satisfied there are d = (n, p — 1) roots. This criterion has the 
disadvantage that it requires knowledge of the value of indc; the 
following is more useful. 


THEOREM 4-13. Let (c,q) = 1, q being any number which has 
primitive roots. Then a necessary and sufficient condition that the 
congruence 
x” = c (mod q) (6) 
be solvable zs that 
c?@/4 = | (mod q), 


where d = (n, 9(q)). 


Proof: By an argument similar to that just given for prime modu- 
lus, a necessary and sufficient condition for the solvability of (6) is 
that ind c = 0 (modd). This is equivalent to 


an ind c = 0 (mod ¢(q)), 


or, what is the same thing, 
c?(@/4 = 1 (mod q). 
If x” = c (mod ™m) is solvable, and (m, c) = 1, c is said to be an 
nth power residue of m, otherwise a nonresidue. 


THEOREM 4-14. The number of incongruent nth power residues of q 
ts o(q)/d, and these residues are the roots of the congruence 


go? @id — 1 (mod q). 


4-3] INDICES 99 


Proof: The second statement is a paraphrase of Theorem 4-13. 
Since g has a primitive root g, the roots of the congruence 2°‘@/4 = 
1 (mod q) are the numbers g’ for which 


gi?@!l4 = | (mod q), 


and this requires that d|t. But the number of multiples ¢ of d with 
1 <t< ¢(q) is exactly o(q)/d. (Note that this is a generalization of 
Theorem 3-24.) 


PROBLEMS 


1. Show that if g and A are primitive roots of p, then 
ind, @ = ind, a- ind, g (mod p — 1). 


2. Given that 2 is a primitive root of 29, construct a table of indices, and 
use it to solve the following congruences: 


(a) 17z = 10 (mod 29) (b) 172? = 10 (mod 29). 
3. Develop a method for solving the congruence 
Az? + Br + C = 0 (mod p) 


by use of indices, when p is an odd prime which does not divide A. (First 
show that the given congruence can be replaced by one in which the coeffi- 
cient of x? is 1; then, after suitable modifications, complete the square.) 
Apply your method to 


(a) 17z* — 32 + 10 = 0 (mod 29) (b) 17x? — 4x + 10 = 0 (mod 29). 


4, Let g be a number having primitive roots. Show that h is a primitive 
root of g if and only if h is an rth power nonresidue of g for every prime r 
dividing o(q). [Hint: Write h = g*, where g is a primitive root of g, and 
show that each of the allegedly equivalent statements is equivalent to the 
equation (k, o(q)) = 1.] By eliminating all the appropriate powers of the 
elements of a reduced residue system, find all the primitive roots of (a) 13, 
(b) 29. (Cf. Problem 3, Section 4-1.) 

*5. Show that for x > 1 the quantity 


fe) =¥= 
ae 


=gw-yr4+()g-prt---+(Z)e-D+4 


where g is prime, g” > 2, and y = 2%", has the following properties: 
(a) For zx = 1 (mod q), al|f(z), and for z ¥ 1 (mod q), ¢tf(z). 
(b) f(z) > g. 


1 
pn ah eae 


60 PRIMITIVE ROOTS AND INDICES [cwap. 4 
(c) (f(z), 2) = 1. 
(d) If p¥ qand plf(x), then p = 1 (mod g”). 

Deduce that there exists a prime p= 1 (mod q”), and then by taking 


x = pi... Pr, where each p; = 1 (mod gq”), that there are infinitely many 
primes p = 1 (mod q”). (Cf. Problem 4, Section 3-7, where q = 2.) 


4—4 An application to Fermat’s conjecture. A simple way of 
attempting to show that the equation 


x" oy" = 2" (7) 


has no nonzero solutions for n > 3 is to show that the infinitely many 
congruences 


xz” + y” =z” (mod p), p= 2.3; 5y e535 


impose absurd conditions on the variables. For example, in the case 
n = 3 the congruence 


a? + y® = 2° (mod 7) 


implies that 7lzyz. For if 7tu, then u®° = 1 (mod7), so that 
u®=+1 (mod 7), and for no choice of signs is +1+1=-++1 (mod 7). 
If we could find infinitely many primes p such that 


a? + y® = 2° (mod p) 


implies p|zxyz, then clearly equation (7) could have no nonzero solu- 
tion for n = 3. We shall show that this cannot be done, either for 
n = 8 or for larger n. The proof depends on the following combina- 
torial lemma. 
TueroremM 4-15. If the numbers 1, 2,...,N are distributed into m 
disjoint classes, and if* N > mle, then at least one class contains 
the difference of two of its elements. 


Proof: Suppose that the numbers 1, 2, ..., N have been put into m 
disjoint classes so that no class contains the difference of any two of its 
elements. Let a class having the largest number of elements be called 


Z,; then if Z; is composed of 71, ..., %n,, we have N < nym. If the 
names are so chosen that 71 < %2 <-°* < %n,, the n, — 1 differences 
Lo — 1, %3 — VW,..-,4ny — V1 (8) 


are also integers between 1 and N, inclusive, and by assumption they 


*Here, contrary to our convention, the number e = 2.718...is not 
an integer. 


44] AN APPLICATION TO FERMAT’S CONJECTURE 61 


lie in the remaining m — 1 classes. Let Z2 be a class in which the 
largest number of differences (8) lie. If Z2 contains the ne differences 


La — %1, 7B — VM1,---, (9) 
then clearly n; — 1 < np (m — 1). Now the ng — 1 differences 
tg — La, ty — La,--- (10) 


do not lie in either Z, or Zs, so they must be distributed among the 
remaining m — 2 classes. If ng is the largest number of differ- 
ences (10) in any single class, then ng — 1 < ng(m— 2). Con- 
tinuing in this way, we have 


Ny — 1< Ny+1 (mM i L); (11) 
for » = 1, 2,..., m1, where m, is such that n», = 1. From (11), 
we have 
Mp 1 Mu+l 


——# ge Opp =H 1,2,...,0 
(m— yp)! (m—p)! | (m—n— 1)! ; 


and adding all these inequalities gives 
Nn i 1 1 
—_—————_— < — —_——_—_ eee ———_———_—_- < é. 
(m — Di @m—DI* m—a2ylt aN 
oe N <nm < mie, 
and the proof is complete. 


THEOREM 4-16. There are only finitely many primes p for which 
every solution of the congruence 


x” + y” = 2” (mod p) (12) 


is such that p\xyz. More precisely, if p > nie +1, then (12) has 
solutions such that ptxyz. 


Proof: First suppose that n|(p — 1), so that p — 1 = mr for 
suitable r. Let g be a primitive root of p, and let s,, be the smallest 
positive residue (mod p) of g”. Then the numbers s1,..., Sp; are 
the integers 1, 2,..., » — 1 in some order. We now classify the 
numbers s,, according to the residue classes of their subscripts (mod 7), 
so that for each ¢ with 0 < ¢ < n — 1, the numbers 


St, Sttn) cee y St+(r—1)n 


form a single class, there being 7 classes altogether. By Theorem 


62 PRIMITIVE ROOTS AND INDICES [cHap. 4 


4-15, if p — 1 > nie, then some class contains three elements, say 
St+jn, Strkny St+ln) such that 
St+jn — Sttkn = St+in- 
But then 
gitin aes gr ae gr (mod p), 
whence 
g’” = g*" + g (mod p), 

and the numbers z = g*, y = g', z = g give the desired solution 
of (12). 


If nt(p — 1), let d = (n,p — 1). Then by what we have just 
proved, the congruence 


x? + y? = 24 (mod p) 


is solvable with ptryz if p — 1> dle. But by Theorem 4-13, any 
dth power is an nth power residue of p, since 


(ut) (P-D) Ln) = (u*) (p—l)id — yP-! = | (mod p). 


Thus there exist an 21, yi, and 2; such that 


d 


We: eos n— ,d nN — pf 
Ly =X, Y =Y, 4. = 2 


(mod p), 


and hence 
xy” + yy” = 2,” (mod p). 


PROBLEM 


Show that if z* + y? = 23 (mod 9), then 3|zyz. Use the result of this 
section, together with the method of Section 3-5, to show that this is an 
atypical phenomenon: that for fixed n, the congruence 

z™ + y” = 2" (mod p*) 


has a solution such that ptzyz, if p is sufficiently large and a > 1. 


REFERENCES 
Sectto,. ‘-4 


The main theorem was first proved by L. E. Dickson, Journal fir dre 
Reine und Angewandte Mathematik (Berlin) 135, 134-141, 181-188 (1909). 
The proof given here is due to I. Schur, Jahresbericht der Deutschen Mate- 
matikerveretnigung (Leipzig) 25, 114-117 (1917). Dickson’s proof is more 
difficult, but shows that equation (12) is solvable if p > cn‘ for suitable c. 


CHAPTER 5 


QUADRATIC RESIDUES 


5-1 Introduction. The subject of nth power residues is a large 
and difficult one. It happens, however, that for the case n = 2 many 
elegant and important results can be obtained by elementary con- 
siderations, and it is to these that we now turn our attention. A 
fundamental tool in the investigation of quadratic residues is Euler’s 
criterion, Theorem 3-24, which was generalized somewhat in Theo- 
rem 4-18, namely that a necessary and sufficient condition for a 
number a prime to g to be a quadratic residue of q is that a? /? = 
1 (mod q). (Here qg is a number greater than 2 having primitive 
roots.) The two problems with which we shall deal are, first, to 
extend this criterion to general composite moduli (in so doing we shall 
find that it suffices to restrict our attention to odd prime moduli); 
and, second, to find an efficient method for determining all the primes 
of which a given integer a is a quadratic residue. 

The prime 2 plays a rather special role in the theory of quadratic 
residues, not so much because of an intrinsic difference between it and 
the odd primes (which does exist, as we saw in the discussion of 
primitive roots of composite moduli) as because the congruences are 
quadratic; in a similar fashion, 3 must be treated separately when 
considering cubic congruences. On account of this, we shall use the 
symbol p to represent an odd prime throughout this chapter. 


5-2 Composite moduli 


THEOREM 5-1. A number a prime to m is a quadratic residue of m 
if and only if tt 1s a quadratic residue of all odd prime divisors of m 
and is congruent to 1 (mod 4) if m = 4 (mod 8), and congruent to 
1 (mod 8) zf 8|m. 


Proof: Let 
m = 2° I] p;**. 
t=1 
63 


64 QUADRATIC RESIDUES [cHap. 5 


Then the congruence 
x” = a(mod m) 


is equivalent to the system of congruences 
2” = a (mod 2%) 
z* =a (modp,), i=l,...,7, 


so that a is a quadratic residue of m if and only if it is a quadratic 
residue of every prime-power divisor of m. 

(a) If a is a quadratic residue of 7p, it is a quadratic residue of p%, 
and conversely. (The converse is trivial.) For if a is a residue of p, 
it follows from Euler’s criterion that 


a’P—))/2 = } (mod p). 
By Theorem 4-5, 


q?**(—-1)/2 = 1 (mod p%), 


and so a is a quadratic residue of p* by Euler’s criterion again. If a 
is a quadratic residue of p and 2, = a (mod 7), then also (—2z,)? = 
a (mod p), and these are the only solutions, by Lagrange’s theorem. 
Using the method of Section 3-5, it is easily seen that if ptf’ (x,) for 
each root x, of f (x) =0 (mod p), then the congruence f(x) =0 (mod p%) 
has exactly as many roots as the congruence with prime modulus. In 
this case f(x) = x? — a, f’(x) = 22, and since p is odd and pia, it 
follows that x* = a (mod p%) has exactly two solutions (mod p%) if 
a is a quadratic residue of p. 

(b) For modulus 2% the situation is more complicated: If a is odd, 

(i) 2x? =a (mod 2) is always uniquely solvable, 
(ii) 2x? =a (mod 4) is solvable if and only if a = 1 (mod 4); 
it then has two roots, 
(iii) 2? =a (mod 2%), for a> 8, is solvable if and only if 
a = 1 (mod 8); it then has four roots. 

The first statement is obvious, and the second follows immediately 
upon noting that any odd square is congruent to 1 (mod 4). (The 
two solutions are, of course, +1.) For the case a > 3, recall that it 
was shown in the proof of Theorem 4-9 that 5 is a primitive \-root of 


2*, and so the numbers 5, 5”, 5°, ..., 52" are distinct (mod 2°). 
Since by the binomial theorem (1 + 27)” = 1-+ 4n (mod 8), 
5" = 1 (mod 8) if and only if n is even. Thus 5”, 54, 5®,..., 527” 


are 2*-3 numbers which are distinct (mod 2%) and which are all con- 


5-2] COMPOSITE MODULI 65 


gruent to 1 (mod 8). But since there are exactly 2*~* numbers in a 
complete residue system (mod 2%) which are congruent to 1 (mod 8), 
it follows that every number congruent to 1 (mod 8) is congruent to 
52” for some 7, so that every a = 1 (mod 8) is a quadratic residue of 
2°. (If 5?” = a (mod 27), the congruence x” = a (mod 27) has the 
solution x = 5”.) On the other hand, every odd square is of the form 
8k +1, so that 2? = a (mod 2%) is certainly not solvable unless 
a = 1 (mod 8). 

Assume that b? = a (mod 2%), and let x be any other solution of 
this congruence, so that also x? = a (mod 2%). Then 2? — b? = 
(2 — b)(x + b) = 0 (mod 2%). Both z and 6 are odd, so x — b and 
x +b are even; since (x — 6, x + b) = 2(z, db), one of them has 
2 as a simple factor. Since 


xz—b «+b 
2 2 
one factor must be divisible by 2°—?, that is, 


zxib 
2 


Hence x = +b (mod 2°), and x must be congruent to one of +b, 
+b + 27! (mod 2%). It is immediately verified that each of these 
four numbers is a solution. 

Combining the results of (a) and (b) gives Theorem 5-1. By the 
Chinese Remainder Theorem, the number of roots of x” = a (mod m) 
is the product of the numbers of roots of the congruences with prime- 
power moduli. As shown above, if @ is a quadratic residue of m, this 
number is 2 for each odd prime-power factor, and 1, 2 or 4 according 
as a is 0 or 1, 2, or more than two, where 2%||m. Hence we have 


= 0 (mod 27°”), 


= 0 (mod 2%”). 


THEOREM 5-2. If (a,m) = 1 and the congruence x? = a (mod m) 
is solvable, it has exactly 2°** solutions, where o is the number of 
distinct odd prime divisors of m and 7 is 0, 1, or 2 according as 44m, 
2?\|m, or 8|m. 


PROBLEMS 


1. Decide whether 5 is a quadratic residue of 44. 

2. Show that the product of the quadratic residues of a prime p 1s con- 
gruent to 1 or —1 (mod p) according as p = —1 or 1 (mod 4). ([Hint: 
Write the residues of p in terms of a primitive root.] 


66 QUADRATIC RESIDUES [cHap. 5 


*3. Prove the following generalization of Wilson’s theorem: The product 
of the positive integers less than m and prime to m is congruent to 
—1 (mod m) if m = 4, p*, or 2p%, and to 1 otherwise. [Hzint: Proceed 
as in the second proof of Wilson’s theorem, associating a and a’ if aa’ = 
1 (mod m). Use Theorem 5-2 to count the elements associated with 
themselves.] 


5-3 Quadratic residues of primes, and the Legendre symbol. As 
was seen in Section 5-2, the quadratic residues of powers of 2 can be 
given explicitly, and the quadratic residues of powers of an odd prime 
are identical with those of the prime itself. Consequently, there re- 
mains only the investigation of quadratic residues of odd primes. 
Hereafter we shall make use of the simplifying notation of the 
Legendre symbol (a/p), introduced at the end of Chapter 3. It will 
be recalled that for (a, p) = 1, we put: 


aie 1, if a is a quadratic residue of 7p, 
. —l, if a is a quadratic nonresidue of p. 


For completeness we put (a/p) = 0 if pla, so that (a/p) is now 
defined for every odd prime p. 


TuHErorEM 5-3. The Legendre symbol (a/p) has the following 
properties: 

(a) (ab/p) = (a/p)(b/p). Thus the product of two residues or 
two nonresidues is a residue; the product of a residue and a non- 
residue 1s a nonresidue. 

(b) If a = b (mod p), then (a/p) = (6/p). 

(c) (a*/p) = 1 af pta. 

(d) (—1/p) = (-1)?-?”?. 


Proof: The first two parts are obvious if plab, so suppose that 
ptab. In the proof of Theorem 3-24 it was shown that (a/p) = 
a‘?—)!2 (mod p). Hence 


(ab/p) = (ab)?-Y? = g®-V26@-D/2 = (a/p)(b/p) (mod p), 


and since (a/p) assumes only the values +1, it follows that (ab/p) = 
(a/p)(b/p). Property (d) also follows immediately from this con- 
gruence. Properties (b) and (c) are obvious. 


It follows from Theorem 5-3 that in investigating the Legendre 
symbol (a/p), there will be no loss in generality in assuming that a is 


5-3] THE LEGENDRE SYMBOL 67 


a positive prime. For example, Theorem 5-3 shows that 
(—48/31) = (—1/81}{(48/31) = (—1/31) (3/31) (16/31) 
= (—1/31) (3/381) 
= (80/31) (3/31) = (2/31) (3/31) (5/31) (3/31) 
= (2/31) (5/381), 
so that (—48/31) can be evaluated either from 
(—48/31) = (—1)?@*- (3/81) = — (8/31) 
or from 
(—48/31) = (2/31) (5/81). 


In general, (a/p) can be written as the product of Legendre symbols, 
in which the first entries are the distinct prime divisors of a which 
divide a to an odd power. 

Although it will be used only in the case where a is prime, the 
following theorem is valid for all a’s for which pta. 


THEOREM 5-4 (Gauss’s lemma). If pu is the number of elements of 


the set a, 2a, ..., 3(p —1)a whose numerically least residues 
(mod p) are negative, then 
(a/p) = (—1)*. 


Example: Ifa = 3, p = 31, the numerically least residues (mod 31) 
of 3-1,3-2,...,3-15are3, 6, 9, 12, 15, —13, —10, —7, —4, —1, 2, 
5,8, 11, 14; thus u = 5, (8/31) = —1, and from the above numerical 
example, (—48/31) = 1. 


Proof: Replace the numbers of the set a, 2a, ..., (p — 1)a by 
their numerically smallest residues (mod p); denote the positive 
ones by 71, r2, .. . and the negative ones by —7r,’, —ro’,.... Clearly 
no two r,’s are equal, and no two 7,’’s are equal. If mja = 7; and 
moa = —7r; (mod p), then r; =17;/ would imply a(m, + m2) = 
0 (mod p), which implies m; + mz = 0 (mod p), and this is impos- 
sible because the m’s are strictly between 0 and p/2. Hence the 
(p — 1)/2 numbers r;,7;’ are distinct integers between 1 and (p — 1)/2 


inclusive, and are therefore exactly the numbers 1, 2,..., (p — 1)/2 
in some order. Hence, 
— |] —] 
a-2a-- Aa = (—1)4 5 ! (mod p), 


qP-Dl2 = (—1)* (mod p), 


68 QUADRATIC RESIDUES [cHap. 5 
Since also a?—))/* = (a/p) (mod p), it follows that 
(a/p) = (—1)* (mod p), 


(a/p) = (—1)*. 


In distinction to Euler’s criterion, Gauss’s lemma can be used to 
characterize the primes of which a given integer @ is a quadratic 
residue. For example, if a = 2, then yu is the number of numbers 2m, 
with 1 <m< (p—1)/2, which are greater than p/2; this is 
clearly true if and only if m > p/4. Thus if we write [z] to stand for 
the largest integer not exceeding z, it follows that 


_p=1.. H 
. 2 4 
If now 


p=8k+1, then p=4k—[2k+4]=4k—2k=0 (mod 2), 
p=8k+3, then w=4k+1—[2k+32]=4k+1—2k=1 (mod 2), 
p=8k+5, then p=4k+2—[2k+1+2]=2k+1=1 (mod 2), 
p=8k+7, then »=4k+3—[2k+1+2]=2k+2=0 (mod 2), 
and we deduce that 2 is a quadratic residue of primes of the form 
8k + 1 and a nonresidue of primes 8k + 3. Since it happens that the 


quantity (p? — 1)/8 satisfies exactly the same congruences as pu 
above, this result can be stated in the following form. 


TuEorEM 5-5. (2/p) = (—1)?-Y 8, 

As an application of Theorem 5-5, we have 

THEOREM 5-6. (a) 2%8s a primitive root of the prime p = 4q + 1 
af gis an odd prime. 


(b) 2tsa primitive root of p = 2g + 1if qisa prime of the form 
4k +1. 


(c) —2 isa primitive root of p = 2q + 1if qisa prime of the form 
4k — 1. 


and finally, 


Proof : (a) If ord, 2 = ¢, then ¢|p — 1, which is equivalent to 
saying that t|4g. Aside from 4, every proper divisor of 4q is also a 
divisor of 2g, and if 2* = 1 (mod p), then p is 5 and q is not prime. 
Hence it suffices to show that 27% # 1 (mod p). But 


274 = Q@-DI2 = (2/p) = (—1)%-Y/8 = (—1)*+4 = ~1 (mod p). 


5-4] THE LAW OF QUADRATIC RECIPROCITY 69 


Parts (b) and (c) can be proved in a similar fashion. Part (a) shows 
that 2 is a primitive root of 138, 29, 53,...; part (b) shows that 2 isa 
primitive root of 11, 59, 83,..., and part (c) that —2 is a primitive 
root of 7, 23,47,.... Itisan unproved conjecture that 2 is a primitive 
root of infinitely many primes, which would follow from Theorem 5-5 
if it could be shown that there are infinitely many primes p of the 
kinds described in (a) and (b). 

Referring to (a), this requires a proof that the function 4x + 1 
assumes prime values for infinitely many prime arguments. Un- 
fortunately, there is no nonconstant rational function known to have 
this property. If one could prove that the function x + 2 has it, one 
would have proved a conjecture which is one of the outstanding 
problems in additive number theory: that there are infinitely many 
“twin primes,”’ such as 17 and 19, or 101 and 103. 


PROBLEMS 


1. Apply Gauss’s lemma to determine the primes of which —2 is a 
quadratic residue, and show that your result is consistent with Theorem 
5-3, parts (a) and (d), and Theorem 5-5. 

2. Complete the proof of Theorem 5-6. 

*3. Show that 7 is a primitive root of any prime of the form 24** + 1 with 
n> 0. [Hint: Show first that it suffices to prove that (7/p) = —1, and 
then show that any prime of the specified form is congruent to 3 or 5 
(mod 7). Note that 2* = 2 (mod 7).] 

4. Show that the numbers 6k — 1 and 6k + 1 are twin primes if and 
only if the equation k = 6zy + x + y has no solution in positive integers 
xz and y for any of the four choices of sign. [Note that if 6k + 1 = mn, 
then m = n = +1 (mod 6).] Show that this characterizes all the twin 
primes except 3 and 5. 


5-4 The law of quadratic reciprocity. Gauss’s lemma can be used 
to establish a deep property of the Legendre symbol which is an 
essential tool both in determining the quadratic character of a prime 
q (mod p) and in finding the primes p of which g is a quadratic 
residue. 


THEOREM 5-7 (Quadratic reciprocity law). If p and q are distinct 
odd primes, then 


(p/q)(q/p) = (—1}P@-P3G-Y, 


70 QUADRATIC RESIDUES [cnar. 5 


In other words, (p/q) = (q/p) unless both p and q are of the form 
4k — 1, in which case (p/q) = —(q/p). 


Proof: By Gauss’s lemma, the numbers y and » in the equations 
(q/p) = (-1)*, (v/a) = (—1)” 
are the numbers of the multiples 
p—l 
29g, ...,-—=— 
qd; q; ) y) qd; 
and 
q-1 


P, 2p,...,"5_ P 


whose absolutely smallest residues (mod p) and (mod q) respectively 
are negative, and we need only show that 


If y is chosen so that 


P P 
9 <qz-—- py < 9 ’ 
then clearly gr — py is the numerically smallest residue of gx (mod 7p). 
From this inequality we get 
qx il qx il 
sg eee oe < en edaat sg 
p 2% . p * 2 
Thus y is unique and non-negative; if y = 0 then gx — py = gx > 0, 
and there is no contribution to u in this case. Moreover, we see that 
for z < (p — 1)/2, 


p 2~ 2 


so that also y < (¢g—1)/2. The number yp denotes therefore the 
number of combinations of x and y from the sequences 


p—l 
(p) er 
and 

—] 
(q) ee eet 


5-4] THE LAW OF QUADRATIC RECIPROCITY 71 


respectively, for which 


0> qr —py>—5- 


Similarly, vy is the number of pairs x and y from the sequences (p) 
and (q) respectively, for which 


0> py—gr>—>- 


For any other pair x and y from (p) and (q) respectively, either 


Pp 
PY E> G 


or 
py—qr< —4; 


let there be A of the former and p of the latter. Then clearly 


pet ah 
9 9 =utv+At_P. 
Finally, as z and y run through (p) and (q) respectively, the num- 
bers 


, pti 


1 
aw and pot ay 


2 


run through the same sequences, but in the opposite order. And if 
py — qx > p/2, then 


1 +1 
py’ — qa’ = (244+) 9 pets) 


2 2 
ak p-q_ P__ gs 
Hence \ = p, and 
~l q-l 
Po f= = tet =u +» (mod 2). 


By combining the law of quadratic reciprocity with the properties 
of the Legendre symbol mentioned in Theorem 5-3, it is easy to 
evaluate (q/p) if p and q do not lie beyond the extent of the available 


72 QUADRATIC RESIDUES [cHaP. 5 


tables of factorizations of integers. For example, 2819 and 4177 are 
both primes and 4177 = 1 (mod 4), so that 


(2819/4177) = (4177/2819) = (1358/2819) = (2-7 + 97/2819) 
(2/2819) (7/2819) (97/2819) 
—1-— (2819/7) (2819/97) = (5/7) (6/97) 
= (7/5) (2/97) (97/3) 
= (2/5)(1/3) = —1, 
and so 2819 is not a quadratic residue of 4177. 
Moreover, the quadratic reciprocity law can be used to determine 
the primes p of which a given prime q is a quadratic residue. This 


result, which is contained in the next theorem, has sometimes been 
taken as the quadratic reciprocity law, rather than Theorem 5-7. 


THEOREM 5-8. Every p ¥ q can be uniquely represented in the form 
4gk +- a, where 0 <a < 4q and a = 1 (mod 4). For a fixed odd 
prime q, the solutions of the equation (q/p) = l are exactly the primes 
p ~ q such that the corresponding a is a quadratic residue of q; that is, 
(q/p) = (a/q). The numbers a such that 


0<a < 4g, a = 1 (mod 4) and (a/q)=1, (1) 
are given by the least positive residues (mod 4q) of the numbers 1”, 37, 
na (ee?) 


Proof: Clearly every odd number can be written in the form 
4qk’ + a’, where 1 < a’ < 4g, and a’ is odd. If a’ = 1 (mod 4), 
takea = a’andk = k’, while if a’ = —1 (mod 4), takea = 4q — a’ 
and k = k’+ 1. Thus every odd number, and therefore every p, 
has a representation either as 4qk + a (if a’ = 1 (mod 4)) or as 
4gk — a (if a’ = —1 (mod 4)). This proves the first sentence. 

If p = a (mod 4), then p = 1 (mod 4), so that 


(q/p) = (p/q) = (a/Q). 
If, on the other hand, p = —a (mod 4g), then p = —1 (mod 4), and 


(g/p) = (—1)8P YY (p/q) = (= 1)*P YD (_g/q) 
= (— 1)$@-) *4(q—1) (— 1)?-) (a/q) 
= (—1)#@+D-4G-D (g/g) = (a/q). 


Thus always (q/p) = (a/q), which proves the second sentence. 


5-4] THE LAW OF QUADRATIC RECIPROCITY 73 
Finally, if (a/g) = 1, there is an x such that 
z? = a (mod q) and l<2z<q-1, 
whence also 
(q—2x)?=a(modqg) and l<q-2r<q-1. 
Since either x or q — x is odd—say x’—we have 
2’? =a(modq), 1<2’<q—-2, «x =1 (mod2). 
But then 
a’? = 1 =a (mod 4), 


so that 
az’? = a (mod 49), 
and the proof is complete. 

To illustrate, take g = 3. Then the only integer satisfying the con- 
ditions (1) is 1, so that 3 is a quadratic residue of primes 12k + 1. 
Every other odd number is of one of the forms 12k +- 3 or 12k + 5, 
and no prime except 3 occurs in the progressions 12k + 3. Hence 
(3/p) is completely determined by the equations 

1, if p = +1 (mod 12) 
@i)={ 7 yee : 
—1, if p = +5 (mod 12). 
Similarly, taking g = 17 we consider the squares 
1?, 37, 57, 7°, 97, 11°, 137, 15°, 
which reduce (mod 68) to 
1, 9, 25, 49, 13, 53, 33, 21. 
We have. that 17 is a quadratic residue of primes of the forms 
68k + 1, 9, 18, 21, 25, 33, 49, and 53, 
and a nonresidue of primes of the forms 
68k + 5, 29, 37, 41, 45, 57, 61, and 65; 
17 itself is the only prime of the forms 68k + 17. 

In general, out of the 2q progressions 4gk +: a, gq — 1 contain only 

primes of which q is a residue, g — 1 contain only primes of which q is 


a nonresidue, and two (either 4gk + q or 4qk + 3q, according as 
q = 1 or 3 (mod 4)) contain no primes besides q itself. 


74 QUADRATIC RESIDUES [cHap. 5 


Determining the primes of which a composite number is a quadratic 
residue Is somewhat more complicated. To illustrate, consider the 
problem of finding the primes p for which (10/p) = 1. This requires 
that either (2/p) = (5/p) = 1 or (2/p) = (5/p) = —1, so that 
either 

p = +1 (mod 8) and p = +1 (mod 10) 
or 
» = +3 (mod 8) and p = +3 (mod 10), 


all combinations of signs being allowed. Thus we have the following 
pairs of congruences, each pair to be solved simultaneously: 


p = 1 (mod 8) p = —1 (mod 8) p = 1 (mod 8) 

p = 1 (mod 10) p = —1 (mod 10) = —1 (mod 10) 
= —]1 (mod 8) p = 3 (mod 8) p = —3 (mod 8) 
p = 1 (mod 10) p = 3 (mod 10) p = —3 (mod 10) 

p = 3 (mod 8) p = —3 (mod 8) 


=-—3(mod10) p=3 (mod 10). 
Solving (by the method of Problem 3, Section 3-4, for example), we 
obtain p = I, —1,9, 31,3, —3, 27, 13 (mod 40); 


that is, 10 is a quadratic residue of the primes 40k =+ 1, 3, 9, 18, anda 
nonresidue of the others. 


PROBLEMS 


1. Evaluate the Legendre symbols (503/773) and (501/773). 

2. Characterize the primes of which 5 is a quadratic residue; those of 
which 6 is a quadratic residue. 

3. Show that if p = 4m + 1 and d|m, then (d/p) = 1. [Hint: Let q be 
a prime divisor of m, and consider separately the cases g = 2 and q > 2.] 

4. Deduce from the representation N = 6119 = 82? — 5- 117 that if p|N, 
then (5/p) = 1. Use this to find the factorization of N. (It suffices to 
consider p < 80.) Use similar ideas to factor 43993 = 211? — 2*- 33. 

5. Prove that 4751 is prime. 


5-5 An application. It is clear that if a given integer a is con- 
gruent to 1 (mod p) for every prime p, then a = 1, since p|(a — 1) 
implies p < |a| + 1 unless a — 1 = 0. Here we have an instance of 
the following principle: if an assertion involving a congruence holds 


5-5] AN APPLICATION 75 


for every prime modulus p, then the statement with the congruence 
replaced by the corresponding equation may be implied. With this in 
mind, it is natural to ask whether it is true that if, for fixed integers a 
and n, a is an nth power modulo p for every :p, then a must be an nth 
power. (Saying that a is an nth power (mod p) means, of course, that 
a is congruent to the nth power of some integer; in other words, that a 
is an nth power residue of p.) Unfortunately, this is not quite the 
case: if the congruence x” = a (mod p) is solvable for every p, then 
a = b” for some b if 8tn, but if 8|n, either a = b” or a = 2”/20". 
Powers of 2 higher than the second cause difficulty here, just as they 
did in the study of primitive roots. (Cf. Problem 1 at the end of this 
section. ) 

At the present time, the theorem just stated cannot be proved in a 
simple way. Even in the special case n = 2 which we now treat, it is 
necessary to use a rather deep result about the existence of primes in 
certain arithmetic progressions. 


THEOREM 5-9. A fixed integer is a quadratic residue of every prime 
uf and only if it is a square. 


Proof: If a = b?, the congruence x* = a (mod 7) has the solution 
xz = b (mod p) for every p. 

Suppose, on the other hand, that a is not a square. Then it can be 
written as --mpipo +--+ p,, where r > 1 and p; ~ p; if i #7. Sup- 
pose first that a is positive; then we wish to show the existence of a 
prime p such that 


(a/p) = (m?p: --- p,/p) = (pi/p) +++ (p-/p) = —1. 


We attempt to find a p such that (p;/p) = 1 if 1 <i <7, while 
(p>/p) = —1. Here, of course, one of the primes pj, ... , pp may be 
2. But since 2 is a quadratic residue of primes 8k + 1, and a non- 
residue of primes 8k =+ 5, the following statement is true for every 
prime q: 

If p = 1 (mod 4g), then (¢/p) = 1. On the other hand, for each q¢ 
there is a u such that glu, w= 1 (mod 4), and if p = u (mod 49), 
then (¢/p) = —1. 

The first part is obvious. When g = 2, u may be taken to be 5 in 
the second part, while if g > 2, uw may be taken as any of the N num- 
bers remaining out of the g integers between 1 and 4q which are con- 
gruent to 1 (mod 4), after the removal of (a) the least positive 


76 QUADRATIC RESIDUES [cHaP. 5 


residues (mod 4q) of the (¢ — 1)/2 squares 17, 3?,..., (¢ — 2), and 
(b) that one of g, 3g which is congruent to 1 (mod 4). Since 
q-—1 gG=1 
Sa aa es Ey eS 
N=q r 1 a, 1, 


such an integer u exists. 
Now consider the system of congruences 


x = 1 (mod 4p;) 


x = 1 (mod 4p,_;) 
x =u (mod 4p,), 
when r > 1, or the single congruence 
x =u (mod 4p;) 
when r = 1, where wu is the number characterized above, with g = p, 


or p;. For r > 1, the necessary and sufficient condition that the 
system be solvable is, by Theorem 3-12, that for all ¢ and j, 


(4p,, 4p;)| (cs — ¢), 


where ¢; = lifi<randc;=uift=r. Since c; = 1 (mod 4) for 
every 1, this requirement is clearly satisfied, so that the system can be 
replaced by a single congruence 


x =u’ (mod 4p; -- - pr), 


where (u’,4p1°--p,)=1. If now p=4p,---p,k+u’ or 
p = 4p,k + u, in the cases r > 1 andr = 1, respectively, then 


(a/p) = 1---1-(—1) = —1, 


and it is seen that in the case a > 0, the theorem is a consequence of 
the famous 


DIRICHLET’S THEOREM. If s and ¢ are relatively prime, there are 
anfinitely many primes of the form sk + t. 


Proofs of special cases of Dirichlet’s theorem have been indicated 
in Problem 4 of Section 3-7 and Problem 5 of Section 4-3. The 
general theorem is proved in Volume II of this work. 

If a = —m?*, then (a/p) = —1 if pla and p = —1 (mod 4). If 
P1,---, Peis any set of primes of the form 4k — 1, then the number 


5-6] THE JACOBI SYMBOL V7 


4p, ... px — 1 has a prime divisor of this same form distinct from 
D1, ---, Dk, SO there are infinitely many primes of this form, and in 
particular there is one which does not dividea. For this :, (a/p)=—1. 

If a = —m*p,---p,, where r > 1 and p; ¥ p; if i ~j, then we 
must find a p such that pla and 


(—1/p)(pi/p) ... (pr/p) = —1. 


But if p is a prime for which (—a/p) = —1, as determined above, 
then p = 1 (mod 4), so that (a/p) = (—a/p). The proof is complete. 


PROBLEMS 
1. Show that the congruence 
ze = 2%°" (mod p) 


has a solution for every prime p, if a > 8. [Hint: Consider the factori- 
zation 


2 9274 
= (2? —2) (2?+2) ((w—1)2+1) ((a+1)2+1) (2? +2”) --- (x27 "4+22°7), 


and show that every p divides one of the first three factors for suitable z.] 

2. Show that if the congruence x” = a (mod m) is solvable for every m, 
then a is an nth power. [Hint: Consider the moduli p*t!, where p*||a 
and @ is positive.] 


5-6 The Jacobi symbol. As was pointed out at the end of the 
proof of the law of quadratic reciprocity, it is necessary to have 
available rather extensive factorization tables if one is to evaluate 
Legendre symbols with large entries. Partly to obviate such a list, 
and partly for theoretical purposes, it has been found convenient to 
extend the definition of the Legendre symbol (a/p) so as to give 
meaning to (a/b) when 6 is not a prime. This is done in the following 
way: put (a/1) = 1, and if 6 is greater than 1 and odd, put 


(a/b) = (@/p1)(@/P2) +++ (@/Pr), (2) 


where ~1~2 ~~ p, is the prime factorization of 6, and the symbols on 
the right in (2) are Legendre symbols. Then the symbol on the left 
in (2) is called a Jacobi symbol; like the Legendre symbol, it is unde- 
fined for even second entry. As we shall see, many more of its proper- 
ties are similar to those of the Legendre symbol, but there is one 


78 QUADRATIC RESIDUES [cHaP. 5 


crucial point at which the analogy breaks down: it may happen that 
(a/b) = 1 even when a is not a quadratic residue of b. For it is 
clearly necessary that each of the Legendre symbols (a/p:) have the 
value 1 in order for a to be a residue of b, while (a/b) = 1 if an even 
number of the factors in (2) are —1 while the remainder are +1. On 
the other hand, a is certainly not a quadratic residue of b if (a/b) = —1. 
The following theorem lists properties of the Jacobi symbol which 
were proved for the Legendre symbol in Theorems 5-3, 5-5, and 5-7, 
together with one (the second) which is peculiar to the extended 
function. 
THEOREM 5-10. The Jacobi symbol has these properties: 
(a) (aja2/b) = (@;/b) (a2/b). 
(b) (a/byb2) = (a/b;) (a@/b2). 
(c) If a, = a (mod b), then (a;/b) = (ae/b). 
(d) (—1/b) = (-1)@?, 
(e) (2/b) = (-1)#@-», 
(f) If (a,b) = 1, then (a/b)(b/a) = (—1)*@7P FO-D_ 
Here the second entry in each symbol is a positive odd number. 
Proof: (a) Putb = p,--+pr. Then 


(aya2/b) = (@142/p1) ++ > (€142/Pr), 
and since these are Legendre symbols, 
(Q,a2/b) = (a;/p1) +++ (a1/Pr) (G2/P1) +++ (@2/Pr) = (ay/b) (a2/b). 
(b) Put b} = py -- + p, and be = Pi ***ps. Then 
(a/bjb2) = (a/pi-++ PrPi *** Ds ) 
= ((a/p,) --* (a/pr)) ((a/pr') +> + @/p,’)) 
= (a/b) (a/b2). 
(c) If a, = ay (mod b) and b = pi; -:- p,, then a1 = a, (mod p,;) 
for? = 1,...,7r. Hence (ai1/pi) = (G2/p;), and 
(a,/b) = (a1/p,) «++ (a1/Pr) = (G2/pi) ++ > @2/Pr) = (a/b). 
(d) Put b = Pi°°-p, Then 


(=1/6) = TL (-1/p) = TI (ye 
a= t=1 


2 ae 
- (—1/b) = (-1)** (3) 


5-6] THE JACOBI SYMBOL 79 
But if m and 7 are odd, then 
(m — 1)(n — 1) = 0 (mod 4), 
mn —1 ecienias 


mn eS ® ied), 


Repeated application of this fact shows that 


y PO A PUNE Pee Guo 2), 
t=1 2 
so that (—1/b) = (—1)® 7, by (8). 
(e) The proof of this is the same as that just given, except that, 
using the fact that m? = 1 (mod 8) if m is odd, we deduce from the 
congruence 


(m? — 1)(n? — 1) = 0 (mod 64) 


that 
m>—1  n?—1_ (mn)? —-1 
§ 8 8 
(f) Puta = p,-:-p,,b = pi'+-:ps. Then 


(mod 2). 


(a/b) (b/a) = TT (a/p2) O (6/3) 


=I I (p;/Ds. ee mI I (p: /Dp;) 
= IL TI @s/ps)@s'/25) 
5 Pj-1 ppl pial —1 


= (~1)i=1 i=! a (—1)753 20 G1 


= (—1)#@-D 40-0, 


Because the laws of operation and combination are the same for 
the two types, Jacobi symbols can be used (and according to the same 
rules) in evaluating Legendre symbols, even though they do not give 
complete information about the quadratic character of a modulo 6; 
all that is required is that one begin with a Legendre symbol. This 
means that the first entry in each symbol does not have to be factored, 


80 QUADRATIC RESIDUES [cHap. 5 


except that powers of 2 must be removed. Thus, using the numerical 
example considered earlier, we have 

(2819/4177) = (4177/2819) = (1358/2819) = (2/2819) (679/2819) 
— (679/2819) = (2819/679) = (103/679) 

— (679/103) = — (61/103) = — (103/61) 

— (42/61) = — (2/61) (21/61) = (61/21) 

(19/21) = (21/19) = (2/19) = —1, 


and we can again conclude that 2819 is a nonresidue of 4177. 


PROBLEM 


Evaluate (751/919), both with and without the use of Jacobi symbols. 
The entries are primes. 


REFERENCES 


Section 5-5 


The general theorem stated in the first paragraph is due to E. Trost, 
Nieuw Archief voor Wiskunde (Amsterdam) 18, 58-61 (1934). It has been 
generalized by H. Flanders, Annals of Mathematics 57, 392-400 (1953). 


CHAPTER 6 


NUMBER-THEORETIC FUNCTIONS AND THE 
DISTRIBUTION OF PRIMES 


6-1 Introduction. A number-theoretic function is any function 
which is defined for positive integral argument or arguments. Euler’s 
g-function is such, as are n!, n”, e”, etc. The functions which are in- 
teresting from the point of view of number theory are, of course, those 
like ¢ whose value depends in some way on the arithmetic nature of 
the argument, and not simply on its size. Two of the most interesting 
of such functions are 7(n), the number of positive divisors of n, and 
a(n), the sum of these divisors. These functions have been treated 
extensively in the literature, partly because of their simplicity and 
partly because they occur in a natural way in the investigation of 
many other problems. For this reason we shall pause briefly to dem- 
onstrate some of their fundamental properties. Recall that, as noted 
in Chapter 3, a number-theoretic function which is not identically zero 
is said to be multiplicative if f(mn) = f(m)f(n) whenever (m, n) = 1. 


THEOREM 6-1. The functions o and + are multiplicative. 


Proof: Assume that (m,n) = 1. Then by the Unique Factoriza- 
tion Theorem, every divisor of mn can be represented uniquely as the 
product of a divisor of m and a divisor of n, and conversely, every such 
product is a divisor of mn. Clearly this implies that 7 is multiplica- 
tive, and that 


cd-yd'= > a”, 


d|m d'|n ad” |mn 
so that also o(m)o(n) = o(mn). 


If f is any multiplicative function and the prime-power factoriza- 
tion of n is 


then clearly 
fin) = f(T *') = Tse, 


81 


82 NUMBER-THEORETIC FUNCTIONS [cHAP. 6 


and so the function is completely determined when its value is known 
for every prime-power argument. In the cases at hand, we have 


t(p*) =atl 


pti — 1 
and o(p7) =1Ll+pte--+p%= : 


p—l 
Thus we have proved 


THEOREM 6-2. If n = p,™... p,%", then 
r(n)=Th (a+1) and = o(n) = TY] &——.. 
t=1 


There is another way of proving the multiplicativity of o and 7 
which uses a basic property of all multiplicative functions: 


THEOREM 6-3. If f is multiplicative and F is the function defined by 
the equation 


F(n) = 2 J (d), 
then F is also multiplicative. 


Remark: The multiplicativity of « and 7 follows immediately from 
the relations 


a(n)= Dd, rn) = 2], 
d|n d\n 


since the functions f; and f. defined by the equations 
fin) =n ands fe(n) = 1 for all n 
are obviously multiplicative. 

Proof: Let (m,n) = 1. Then every divisor d of mn can be written 
uniquely as the product of a divisor d, of m and a divisor dz of n, and 
(d;, do) = 1. Hence 

F(mn) = x f(d) = pe f(didg) = ps F (di )f de) 
mn 1|™m 11m 
d,|n d,|n 


= 2 fi): USf@) = F(m)F(n). 
d,|m d\n 


We shall see in the next section that the converse of Theorem 6-3 
also holds. 

A problem that was of great interest to the Greeks was that of 
determining all the perfect numbers, that is, numbers such as 6 which 


6-1] INTRODUCTION 83 


are equal to the sum of their proper divisors. In our notation this 
amounts to asking for all solutions of the equation 


a(n) = 2n. 
It was known as early as Euclid’s time that every number of the form 
n = 2?-1(2? — 1), 
in which both p and 2? — 1 are primes, is perfect. This is easy to 
verify: 
a(n) = po ae ae ee ee ee 


It happens that a partial converse also holds: every even perfect 
number is of the Euclid type. To see this we put n = 2*!-n’, 
where k > 2. Then 


a(n) = o(2*)o(n') = (2* — 1)o(n’), 
so that if n is perfect, it must be that 
(2* — 1)a(n’) = 2n = 2*n’. 
This implies that (2* — 1)|n’, so we put n’ = (2* — 1)n’’ and obtain 
a(n’) = 2*n’’. 
Since n’ and n”’ are divisors of n’ whose sum is 
n’’ + (2% —1)n” = 2'n"’ = a(n’), 


it must be that they are the only divisors of n’, so that n’ must be 
prime, and so n’’=1, n’ =2*—1. Thus n= 2° 1(2* — 1), 
where 2* — 1 is prime; this can happen only if k itself is prime. 

There are two problems connected with perfect numbers which 
have not yet been solved. One is whether there are any odd perfect 
numbers; various necessary conditions are known for an odd number 
to be perfect, which show that any such number must be extremely 
large, but no conclusive results have been obtained. The other 
question is about the primes p for which 2? — 1 is prime. These 
Mersenne primes 2? —1 are completely known for p < 2300 (the 
corresponding p’s are 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 
607, 1279, 2203, 2281), but it is not known whether there are infinitely 
many such primes. 


84 NUMBER-THEORETIC FUNCTIONS [caap. 6 


Aside from ¢, o, and 7, the function with which we shall be most 
concerned in this chapter is r(x), already defined in Chapter 1 as the 
number of primes.not exceeding x. (We now drop the restriction that 
all variables are integer-valued.) It was shown there that a(x) in- 
creases indefinitely with x, that is, that there are infinitely many primes. 
We now give another proof, which depends on the Unique Factoriza- 
tion Theorem. 

Assume that there are only k primes, say pi,..., px. By the 
Unique Factorization Theorem, every integer larger than 1 can be 
written uniquely as the product of a square-free number (that is, an 
integer which is the product of distinct primes) and a square. But 
with only k primes at our disposal, there are only 


k Aare k — 9k _ k 
(*) + +(,",)+1=2 1<2 


square-free numbers, and there are not more than V. n perfect squares 
less than or equal ton. This means that there are fewer than 2" - Vn 
positive integers not exceeding n, which is obviously falseifn > 2* Vn, 


that is if Vn > 2*. Actually, this argument proves a little more, 
namely that 


a7") > Wn, or x(n) > 


For later use in this chapter we now prove a general combinatorial 
theorem of very wide applicability. (The product representation for 
the g-function, for example, is a special case.) The result is sometimes 
called the principle of cross-classification. 


THEorEM 6-4. Let S be a set of N distinct elements, and let S,..., 
S, be arbitrary subsets of S containing Ni, ..., N, elements, respec- 
twely. For l<t<g<---<l<r, let Sj... be the «ntersec- 
tion of S;, S;,..., Sz, that ts, the set of all elements of S common to 
S;, S;,..., Siz and let N,;.... be the number of elements of S;;...1. 
Then the number of elements of S not in any of Sy, ..., Sts 
K=N- M+ YD M- DL Nite: 


1<i<r 1<i<j<r 1<t<j<k&r 
+ (—1)’M12...7. 
Remark: To obtain the product formula for the g-function, take S 


to be the set of integers 1,...,n, andforl < k <r, take S; to be the 
set of elements of S which are divisible by p,, where n = p,%1- ++ p,°*. 


6-1] INTRODUCTION 85 
If din, the number of integers s < n such that dls is n/d; hence 
n nN 1 
g(ny=n- +} —-4+ YF ----=nTI(1—2), 
WSi<r Di = 1Si<j<r DiPj pin P 


Proof: Let a certain element s of S belong to exactly m of the sets 
Si,...,S, If m=0, s is counted just once, in N itself. If 
0 < m <r, then s is counted once, or (7) times, in N, (7") times in 


the terms N;, os times in the terms N;;, etc. Hence the total con- 


tribution to K arising from the element s is 


(5)-G)+6)-—rer)-aanae 


PROBLEMS 


1. Find an expression for o;,(n), the sum of the kth powers of the divisors 
of n. 


2. Prove that > 73(d) = (= 7(d))?. 
d\n d\n 


[Hint: Both sides are multiplicative functions, so it suffices to consider 
the case n = p*. Cf. Problem 2, Section 1-2.] 

3. Show that, if a(n) is odd, then n is a square or the double of a square. 

4. Show that the number of representations of an integer n as a sum of 


one or more consecutive positive integers is 7(1), where 71 is the largest 
odd divisor of n. [Hint: If 


r+s r 


n= ee a a a a ie) ae & =. 38(s + 2r +1), 


then elther s or s + 2r + 1 divides 7,.] 
*5, Show that the number of ordered pairs of integers whose Lc is n is 


T(n?). 
© 7(n) _ e) 1 2 
2» ns 7 (= *) 


6. Show that 
ifs > 1. [The series involved converge absolutely, and therefore can be 
rearranged in any order.] 


*7, (a) Show that the sum of the odd divisors of n is 
= >, (—1)"/4d. 
d\n 


[Hint: Let d, be an odd divisor of n, and find the total contribution to this 
sum from all divisors of n of the form 2*dj.] 


86 NUMBER-THEORETIC FUNCTIONS [cHap. 6 
(b) Show that if n is even, then 
>, (—1)7/4d = 20(n/2) — a(n). 
din 


*8, Show that, if d|n and (n, r) = 1, then the number of solutions (mod 7) 


of z =r (mod d), (z,n) =1 
¢(d) d pin Pp 
ptd 
(Hint: Take S of Theorem 6-4 to be the n/d numbers 
z=r-+ td, 1 <t <n/d. 


If pid, then ptz. Let the subsets consist of those elements of S divisible 
by the various primes which divide n but not d.] 


6-2 The Moébius function. As we saw in Theorem 6-3, if f is any 
multiplicative function and F is its sum function, so that 


F(n) = 2 54), 


then F is also multiplicative. We now ask whether the converse is 
true—whether the multiplicativity of F implies the multiplicativity 
of f. To this end we attempt to express f(n) as a sum, over the 
divisors of n, of terms involving F(d). Assuming that F is multiplica- 
tive, it is enough to consider F'(p”), and if the converse in question 
is valid we can also restrict attention to f(p”). Since 
f(p") = F(p") — F(p"™), 
we can write 


n\ _ n—a a\ _ p" 
fom) = E wer Fo*) = Zu(E) PO, 


if we define the function yu in the following way: 


p.(1) = 1, 
u(p) = a 9 
u(p”) =0 forn > 1. 


If we now require in addition that » be multiplicative, then p(n) is 
defined for all positive integral n, and it is easily seen that 
1 ifn = 1, 
p(n) = 40 if n is divisible by a square larger than 1, 
(—1)’ ifn = p,°-*p,, where the p; are dis- 
tinct primes. 


6-2] THE MOBIUS FUNCTION 87 


This function yp is commonly called the Mébius function; it plays 
an important role in the theory of numbers. On the basis of the 
heuristic argument above, it is reasonable to conjecture that, for any n, 


fn) = En(5) F@, 


and that from this formula one might be able to deduce the multi- 
plicativity of f from that of F. We now substantiate these conjectures. 


THEOREM 6-5. > #(d) = ( ue 5 ° 
Proof: By Theorem 6-3, the function 
M(n) = pe u(d) 
is multiplicative, and since 
1, if a = 0, 
ee “e, —1404-+-4+0,  ifa>1, 
we see that M(n) = 0 if nis divisible by any prime, that is, if n > 1. 


THEOREM 6-6 (Mobius inversion formula). If f ts any number- 
theoretic function (not necessarily multiplicative) and 

F(n) = pe f(d), 
then 


n n 
f(n) = 2 Pd@)u () = 2 (“) u(d). 
Proof: We have 


= nar (5 *) =F HQ) = Fw) ESO 
d\n d,d2=n d,d, =n 


= > u(d)f@) = 2. Sd) wa), 
d,d|n d,|~ ; 
and, by Theorem 6-5, the coefficient of f(d) is zero unless n/d = 
(that is, unless d = n), when it is 1, so that this last sum is equal 
to f(n). 


As an example of Theorem 6-6, we have 


THEOREM 6-7. y(n) =n ae ae 


88 NUMBER-THEORETIC FUNCTIONS [cHap. 6 


This follows immediately from Theorem 3-9: 


Ded) =n. 
d\n 


It can also be obtained directly from the product representation of 


y(n): 
; 0") a@, 
v(m) = mIt(1— 7) = (1+ nin “ier ial d 


THEOREM 6-8. If 
F(n) = 24 (d) 


and F ts multiplicative, so is f. 


Proof: If (m,n) = 1, then 
mn 
fn) = 5: Fd) (2) 
d,\m did 


Z EOF (do) (=) @) 
d\n 


- EF (*) E Fs) (5) = f(m)f(n). 


PROBLEMS 
*1. Show that 
1 1 2(d 
as re > ad ( ) 
p(n) nadn¢e(d) 
2. Show that 


> wd) = [u(n)]. 
d’|n 


3. Let f be any number-theoretic function of two variables. Show that if 
F is defined by the equation 
F(m, n) a yy f(d,, dz), 
d,|m 
d,|n 
then 


f(m, n) = & pw (ds)p(de)F (= ; *). 
aun 


6-3] THE FUNCTION [z] 89 


*4. Let J,(n) be the number of ordered sets of & equal or distinct positive 
integers, none of which exceeds n and whose Gop is prime to n. Show, in 
the order indicated, that 


(a) pe Ji(d) = n*, 
(b) J; is multiplicative, 


(c) Je(n) = oI(1 - 5): 
pir 


Pp 
5. Let 
as ee f 5 
AGA log p if n is a power of any prime p, 
0 otherwise. 
Show that 
log n = 2; A(d), 
d\n 

and deduce that 


2 #(@) log d = —A(n). 


6. If J is any multiplicative function, then the function #’ defined by the 
equation 
1 ifn = 1 
3(d)0" () = 
2 (@) d 0 if n> 1, 


is also multiplicative. In this notation, find y’ and r’. 
7. If 3 and # have the relation specified in Problem 6 and if 


F(n) = Ds @s (5) 
d\n d 
then 
f(n) =X Fs" (5) 
dln d 
8. Show that if f is multiplicative, then 
D~u(d)f(d) = IT — f(p)). 
d\n pln 
[Hint: Show that the function on the left is multiplicative.] 
6-3 The function [x]. Another function which is of importance in 
number theory is the function [z], introduced in the last chapter to 


represent the largest integer not exceeding x. In other words, for 
each real x, [x] is the unique integer such that 


z—-1<[z]<2<[z]+1. 


90 NUMBER-THEORETIC FUNCTIONS [cHaAP. 6 


For later purposes we list some of the properties of [z]. 


(a) x = [x] + 3, where O <3 <1. #is called the fractional part 
of x. 


(b) [x + n] = [x] + 2, if n is an integer. 


@ tiga (0,  Szian inves, 

(d) [x1] + [xe] < [x1 + 2]. 

(e) [x/n] = [[x]/n] if n is a positive integer. 

(f) O< [2] — 2[z/2] < 1. (Equivalently, [x] — 2[z/2] assumes 
only the values 0 and 1.) 

(¢) The number of integers m for which x; < m < 221s [xq] — [zy]. 

(h) The number of multiples of m which do not exceed z is [x/m]. 


(i) The least non-negative residue of a, modulo m, is the number 
a’ defined by the equation 


a=mi-|icta. 
m 


These properties may easily be proved using the definition of [z] and 
the first property above. 

Another quantity closely related to [x] is the nearest integer to 7, 
which is [z + 3]. Sometimes the quantity —[—-] is also useful; it 
is the smallest integer not less than z. 

In order to simplify the notation, summation signs will sometimes 
be used with the real variable x as upper limit. In these cases, it is 
understood that the summation variable takes values up to [z]; in 
other words, 


[z] 
» fk) = dX F(k). 
k k=a 


=a 


The following relation between the greatest integer function and 
the factorial function will be of importance later. 


THEOREM 6-9. If n is a positive integer, the exponent of the highest 
power of a prime p which divides n! 1s 


(sh) 


6-3] THE FUNCTION [z] 91 
That is, if we set >; =| = E(p, n), 
k=1 


then pe Pm) ||nl. 
Remark: Thesum has, of course, only finitely many nonzero terms. 


Proof: The multiples of p from among the numbers 1, 2,..., 7 
are counted once each in [n/p], those which are also multiples of p* are 


counted again in [n/p”], etc. Thus if p’||m, the total contribution to 
the sum 


from the number m is exactly 7, as it should be. 


PROBLEMS 


1. Carry out the proofs of the properties of [zx] listed in the text. 

2. Prove that [22] + [2y] > [z] + [y] + [e+ y], where x and y are 
arbitrary real numbers. [Hint: Consider separately the cases that neither, 
one, or both of x — [z], y — [y] are greater than $.] 

3. Let f(z, n) be the number of integers less than or equal to z and prime 
ton. Show that 


(a) Df (; , *) = [z]. [Parallel the proof of Theorem 3-9.] 
d\n 


x 
(b) f(z, n) = 2 u(d) - Hi 
d\n d 
4. Let x be a number between 0 and 1. Let a; be the smallest positive 
integer such that 


. | 
m1=r2-—2> 0, 
a1 


let a2 be the smallest positive integer such that 


etc. Show that this leads to a finite expansion 
1 1 


gaa Se er ee 
a1 a2 an 


(that is, that z,,1 = 0 for some n) if and only if z is rational. 


92 NUMBER-THEORETIC FUNCTIONS [cHap. 6 


5. (a) Show that 
(ab)! 
a! (b!)* 


is an integer if a and b are positive integers. [Hznt: Use induction on a.] 
(b) Show that 


(2a)!(26)! 
alb!(a + 6)! 


is an integer. [Hint: Use Problem 2.] 


6-4 The symbols ‘‘O”, ‘‘o”, and ‘‘~”’. If we construct tables 
of values of the common number-theoretic functions, we are 
immediately struck by how erratically they behave. Thus 7(n) can 
be arbitrarily large, since for example 7(2”) = m-+1, and yet 
7(n) = 2 whenever nis prime. Neither ¢ nor o varies quite so wildly, 
in the sense that each of them definitely grows with n, but they are 
still far from monotonic. It is one of the objects of this chapter to 
see what can be said about the size of these and other functional 
values simply in terms of the size of their arguments. 

A very convenient notation has been introduced by Landau for use 
in this connection. Let g(x) be defined and positive for all positive x. 
Then if f(x) is any function defined on some unbounded set S of 
positive numbers (which in all applications here will be either the 
set of positive integers or the set of positive real numbers), and if 
there is a number M such that 


@)I 
g(x) 


for all sufficiently large x € S, then we write f(z) = O(g(x)). (The 
symbol € means “is an element of.”) If 


<M 


_ f(z) 

lim 2 = 
r—> © g(x) 0, 
zES 


we write f(x) = o(g(z)), and if 


6-4] THE SYMBOLS “O”’, “0”, ann “AW” 93 


we write f(z) ~ g(x), and say that f (x) is asymptotically equal to 
g(x). For example, 


sin zx = O(z), 


sin x = o(z), 


sin z = O(1), 
y(n) = O(n), 
Vx = o(z), 


z* = o(e*) _ for every constant k, 


log* z = o(z“) _—‘ for every pair of constants a > O and k, 
[x] ~ x. 


Here each of the second and third equations gives more information 
than the one preceding it; the first says that sin x does not grow any 
faster than x itself, the second that it does not grow as fast, and the 
third that sin x remains bounded as x increases. In the fourth equa- 
tion, O(n) could not be replaced by o(n), since o(p) = p—1~ >. 

The purpose of introducing these symbols is that, by their use, a 
complicated expression can be replaced by its principal or largest 
term, plus a remainder or error term whose possible size is indicated. 
Retaining an estimate for the error term is necessary because if several 
such expressions are combined, one has eventually to show that the 
sum of the error terms is still of smaller order of magnitude than the 
principal term. This in turn makes it necessary to combine terms 
involving “O” and “‘o’’. The following abbreviated rules apply: 


(a) O(O(g(x))) = O(g(z)), 

(b) O(o(g(z))) = o(O(g(z))) = o(o(g(z))) = o(g(z)), 

(c) O(g(z)) + O(g(x)) = O(g(z)) + o(9(z)) = O(9@)), 

(d) o(g(x)) + o(g(x)) = o(g(z)), 

(e) {O(g(x))}? = O(9"(2)), 

(f) O(g(x)) - o(g()) = {o(g(x))}? = o(g’(z)). 
The meaning of the first statement, for example, is that if f(z) = 
O(g(x)) and h(x) = O(f(x)), then h(x) = O(g(z)); this follows from 
the fact that, if 0 <f(z) < Mig(x) and |h(x)| < Mof(z), then 
\h(x)| <M ,Mog(x). The other assertions are equally straightforward ; 
they need not be remembered explicitly, but are listed here to help 
orient the student, who should analyze all of them. Notice that suit- 


94 NUMBER-THEORETIC FUNCTIONS [cHap. 6 


able combinations of these rules give more general ones; for example, 
rules (a) and (c) show that 


O(f(z)) + O(g(x)) = O(max (f(z), g(2))). 


A useful fact to remember is that the implication 


f(z) = O(g(z)) implies —A(f(z)) = O(A(Q(z))) 
does not hold in general; a sufficient condition is that h(kx) = 
O(h(x)) for every positive constant k, if h(x), f(z) > © asz— ~. 
Thus if f(z) is larger than some positive constant for every x > 0, 
then f(z) = O(g(x)) implies that log f(x) = O (log g(z)), but it does 
not imply that 
ef @) — O(e9)) 


since, for example, log x = O (log Vx) but z + O(V 2). 
The situation is quite different for the ‘“o” symbol. If f(z) = 
o(g(z)), then 
ef) = 9 (9) 
if f(x) increases indefinitely with x, but the relation log f(z) = 
o(log g(x)) may be false, e.g., if f(x) = Vz, g(x) = zx. 
Another important point arises when we want to add together a 


set of error terms, the number a(z) of such terms being an increasing 
function of x. It is not true without restriction that 


a(z) a(x) 
E, O(a(2)) = 0(& ox(2))) 
since, for example, 


x = O(x), 2x = O(zx), “6 


but 
(z] [x] 
>, kx # O (= r) . 
k=1 k=1 


What is needed here, of course, is that the constants implied in the 
symbols O(g:,(z)) all be bounded above by some number independent 
of k. The corresponding principle for the “o” symbol is this: if 
fx(z) = o(g.(x)), then we can write f,(r) = &(zx)g,(x), where 
e.(£) 0 asxz— o, for fixed k, and if max (le,(z)|,... , leaczy(x)|) 30 
as 7 — o, then 


a(x) a(z) 
SAS o( > w(2)) 
k=1 k=1 


66)? 
6-4] THE SYMBOLS “OQ”, “0” anp “~” 95 


Turning now to the relation f(x) ~ g(x), notice first that it is 
equivalent to the equation f(r) = g(x) + o(g(x)). Hence if 
g(x) —> © as & increases indefinitely, the difference f(x) — g(x) need 
not remain bounded; all that is asserted is that it is of smaller order 
of magnitude than g(z) itself. 

To give more precise information about f(x), we must consider not 
f(z) but f(z) — g(x). As an example of this, consider the following 
theorem, which is not strictly a number-theoretic result, but which will 
be useful in what follows: 


THEOREM 6-10. There is a constant y = 0.57721 ... (called Euler’s 
constant) such that 


oe ok 
Zi 


Remark: The relation 


= logn +7 +0(2). (1) 


nr 1 nr 1 
> > —-logn~y, or tim (5 7 — log) = 9, (2) 


no \kE=1 


is weaker than (1), since it says nothing about the error except that 
it approaches zero. Notice that (2) is not equivalent to 


> ~~ logn +7 (3) 


k=1 


(that is, terms may not be “‘transposed”’ in an asymptotic relation), 
for (3) has no more content than the simpler relation 


nr 


»~ 1 vlog n. 
Kak 


Proof: Put 


1 
o, = logk — log (k —1) — 7 k= 2,3, 22-25 
and put 
yi loa, n=1,2,..., 
R=1 k 


so that 


1— yn = Lom, n = 2,3,...- 


96 NUMBER-THEORETIC FUNCTIONS [cHaP. 6 


ae 


Wii 4 


WLLL Liz Yl 


Witt tis 


Willltttttiisrisecse 


FIGURE 6-1 


Geometrically, the number a; represents the difference between the 
area of the region between the x-axis and the curve y = 1/z in the 
interval k —1<2<k, and the area of the rectangle inscribed in 
this region; it is therefore positive. The regions having areas ae, a3, 
and a4 are shaded in Fig. 6-1. If the regions having areas ao, .. . , & 
are translated parallel to the z-axis into the interval 0 < z <1, it 
becomes obvious that 0 < 1 — y, < 1 and that 1 — yn41 > 1— ‘Yn, 
forn = 1,2,.... Since every bounded increasing sequence is con- 
vergent, we have that lim (1 — y,) exists; we call the limit 1 — y. 
Referring again to the square 0 < + < 1,0 < y < 1, we see that the 
region whose area is 


t) 


%w—-Y=(1—-y)-Q-xym)= XY om 
k=n+tl 


is contained in the rectangle 0 < x < 1,0 < y < 1/n, of area 1/n, 


so that 
(") 
Y—%2.=0 ae es 
n 


If f(z) ~ g(z) and g(r) —. © as > ~, then log f(x) ~ log g(z). 
The relation e7 ~ e%@) jg usually false, however; it is true only 
when f(z) — g(t) = 0(1). Finally, under the above suppositions, 


The proof is complete. 


6-5] THE SIEVE OF ERATOSTHENES 97 


together with that of the continuity of f and g, one may deduce that 


[ soa~ [aoa 


for sufficiently large fixed a, by applying L’Ho6pital’s rule, but the 
corresponding relation f’ (x) ~ g’ (zx) is not always valid. 


PROBLEMS 


1. Carry out the proofs of all the unproved statements in this section. 
2. Show that 


. 1 
> | = |= x —+ 0(2), 
DiPj wipsse PIP; 
t#) 


tj =1 
tj 


where p; is the 7th prime. 
3. Show that if f(z) tends to zero monotonically as x increases without 
limit, and is continuous for z > 0, and if the series 


PO 
diverges, then 
dD fk) ~|/ f(x)dz. 
k=1 1 


What can be said if the infinite series converges? 
4, It is known that for every n, the nth prime p, is greater than n log n. 
Use this to show that if B, is defined by the equation 


n 


S 2 = log ben By: n= 3,4,..., 


t=1 Di 


then Bs, Bs, ... is a decreasing sequence. 


6-5 The sieve of Eratosthenes. We now turn to the study of 
a(x), and shall obtain many of the classical elementary results con- 
cerning the distribution of primes. None of these estimates is the best 
of its kind that 1s known, but to obtain more accurate results 
would require either too long a discussion to be worth while or the 
use of tools not available here, as, for example, the theory of functions 
of a complex variable. For many purposes our results are quite as 
useful as the better estimates. 


98 NUMBER-THEORETIC FUNCTIONS [cuap. 6 


One method of estimating +(x) is based upon the observation that 
if n is less than or equal to z and is not divisible by any prime less 


than or equal to Vz, then it is prime. Thus if we eliminate from 
the integers between 1 and 2 first all multiples of 2, then all 
multiples of 3, then all multiples of 5, etc., until all multiples of all 


primes less than or equal to ~/x have been eliminated, then the 
numbers remaining are prime. This method of eliminating the com- 
posite numbers is known as the sieve of Eratosthenes; it has been 
adapted by Viggo Brun and others into a powerful method of esti- 
mating the number of integers in a certain interval having specified 
divisibility properties with respect to a certain set of primes. 

We can modify the process just described by striking out the 
multiples of the first r primes p;, ..., pr, retaining r as an inde- 
pendent variable until the best choice for it can be clearly seen. If p, 
is not the largest prime less than or equal to Vz, but is some smaller 
prime, then of course it is no longer the case that all the integers 
remaining are primes, but certainly none of the primes except 7), ..., 
p, have been removed. Thus if A(z, 7) is the number of integers re- 
maining after all multiples of p1, ..., p, (including pi, ..., p, them- 
selves, of course) have been removed from the integers less than or 
equal to x, then 

a(x) <r+A(z,r). 


In order to estimate A(z, 7) we use Theorem 6-4. If we take the 
N = [zx] objects to be the positive integers < x and take S;, for 
1 < k <1, to be the set of elements of S divisible by p,, then 


My M o 
elle Mes, xe eel x 


r xr x Xt 
A(z,r) = [2] - D | pipil | | 
(z,r) = [z] p> |= | + oe. =| ei PiPjPk 


x 
+41 |—4—]. 
PiP2°°° Pr 


The difference between this expression and 


x x 
: ie? DE eee 
Wsisr Di = 1Si<j<r Pi’ Dj 


ee sO es es 
ae Lar eere es 


6-5] THE SIEVE OF ERATOSTHENES 99 


does not exceed 


r r r . 
+()+Ge-+0)- 
and consequently 

a(x)<r+e2z n(1 - =) +2. 
i=1 i 


We need an estimate for the product occurring here. 


THEOREM 6-11. Jf x > 2, then 


1 1 
II (1 ~ 9) < 
pSz log x 
1 


1 1 
Il —— = I1(1+-+4+- ), 
pszl—1/p  pkz a 


Proof: We have 


and, by the Unique Factorization Theorem, this is the sum of the 
reciprocals of all integers having only the primes not exceeding z as 
prime divisors. In particular, all integers less than or equal to x are 
of this form, and so 


1 +1 dy 
LU eer he >| uo 


and the theorem follows. 


We can now prove 


THEOREM 6-12. 


r(t) = o( a ) . 
log log x 
Proof: As above, 


ra) <r+¥te-H(1-+)<ehsen(i--), 
tat Pi t=1 Pi 


and by Theorem 6-11, 


HY 
log p, 


m(x) < arth 4 


100 NUMBER-THEORETIC FUNCTIONS [cHap. 6 


But p, > r, and so 


ee ott, 
log r 


Taking r = [log x], this becomes 


L + 2. glog x _ L at O(2'°8?). 


log log x log log x 


r(z) < 


The last term is O(x'~*) for some e > 0, and this is o( 24 ) ; 
log log x 
Hence 


x x x 
ho a (— log 7 ~e (— log :) mi (- log -) 


*DROBLEM 


Show by a sieve argument that the number of square-free integers not 
exceeding z is less than 


eT1(1 — >) + o(z). 
P 


6-6 Sums involving primes. Theorems 6-11 and 6-12 bear a 
rather peculiar relation to each other: Theorem 6—11 was used in the 
proof of Theorem 6-12, yet the import of Theorem 6-11 is that the 
primes are not too infrequent, while that of Theorem 6—12 is that they 
are not too frequent. For if, for example, the primes were so scarce 
that pn > cn” for some positive number c, and for all n, the product 


aG< 


would be bounded away from zero as x — ©, which it is not. It 
follows from Theorem 6-12, however, that there is no constant c’ 
such that pp, < c’nforalln. The following theorem is, in its implica- 
tions, analogous to Theorem 6~11. 


THEOREM 6-13. The series >> diverges. 
P 
Proof: By Theorem 6-11, 


log II (1 — *) = > log (1 — =) < — log log z. 
P pSz Pp 


pSz 


6-6] SUMS INVOLVING PRIMES 101 


But since the curve y = log (1 + x) lies entirely above the curve 
y = 2z in the interval —5 < x < 0 (see Fig. 6-2), and since p > 2, 
we have 


2 1 y 
— 2 <1og(1 - *) 
P p y = 22 


for all primes p, and so 


y = log(I+ Zz) 


z 


2 
> — > log log z. 
psx P 
In order to get more precise 
information about the behavior 
of the sum 


1 
eae FIGURE 6-2 


pizPp 


we proceed in a rather roundabout way, making use of the connection 
established in Theorem 6—9 between n! and the primes not exceeding n. 


] 
THEOREM 6-14. is log x. 
psz PDP 


Proof: By Theorem 6-9, 
n! = I] pr/rltin/pl t+ 


psn 
and so 
n n n 
wen'= Z(Spoeo +z 5] +[3] +) 
i z Pp sas pin p p* 8 — 
Now 
n n 
2 H logp < & — log p, 
psnLP pinP 
and 


n nr n 
» H logp => > (: — 1) log p = > -logp — > logp 
psnLPp psn \P pinP psn 


> ~ log p —logn > 1. 
P pan 


3 
IA 
3 


Moreover, 


117 n nr nr 
< — — eee < — _— ee 
ue &, (Fa a EA ) oe le ze ( ' p " ) we P. 


102 


FIGURE 6-3 


Thus 


logn! = > ~ log p + O(r(n) log n) 


psn 
1 1 
+0(n¥ (+ 5+--+) 6 7) 
pin \P Pp 


J 
= nF EP + O(e(n) logn) +0(n er), 
psn PDP eed) 


p P(p 
and by Theorem 6-12, and the fact that the series 
log k 
7 kk — 1) 
converges, this gives 
] 
log n! = Sf o( ae" + O(n) 
gan, PD log log n 
ant eh 4 
sia) log log n 


On the other hand, by comparing the area under the curve y = log x 
with the total area of the inscribed rectangles (see Fig. 6-3), we see 
easily that 


logn! = >) log m -| log t dt + O(log n) = nlogn + O(n). 
m=1 1 


Combining the two estimates obtained for log n!, we have 


2S ee 08 P +0(= 8) = nlogn+ O(n), 
pxn P log log n 


6-6] SUMS INVOLVING PRIMES 103 
so that 
lo log n 
» AEP = logn + 0(—E"_). 
psn P log log n 


This proves the theorem when zis an integern. Butifn <z<n-+1, 
then 


l l l 
5 OBR _ & EP - togn + 0( 8") 
log log n 


_ log x ) 
= logr + O (= ae (4) 


THEOREM 6-15. Suppose that d1, \o,... 7s a nondecreasing sequence 
with lamit infinity, that cy, co, ... 1s an arbitrary sequence of real or 


complex numbers, and that f(x) has a continuous derivative for x > dx. 
Put 


C(x) = 2; Cn, 
AMSZ 


where the summation ts over all n for which \n <x. Then for x > v4, 


2 Cnf (An) = C(zx)f(z) -[ C (é)f’ (£) dt. 
Proof: We have 
2, Cnf (An) = C(Ar)f(r) + (Cs) — Cr))fAz) +--- 
7 + (C(A,) i C(Ar1) Fv), 


where A, is the greatest \, which does not exceed x. -Regrouping the 
terms, we have 


2 Cnf (Mn) = C(M1) (fr) — fe) # °° 


+ C(Ay-1) (f (Ay—1) ~~ f (A,)) 
+ CA) (fr) — f(e)) + COs) (2) 


= — [ “Cor ® dt + C(x)f(z), 


since C(t) is a step function, constant over each of the intervals 
(Az_1, Ax) and over the interval (A,, x). 


1 
THEOREM 6-16. > 5 ~ log log x. 


PSz 


104 NUMBER-THEORETIC FUNCTIONS [cHap. 6 


Proof: Take \, = pa, Cn = log pPr/Pn, f(t) = 1/logt in Theo- 
rem 6-15. By (4), 


Ea: lo 1 
as Ge | —-) 
pSzP 27 a<p<z\ p logp 


1 l =/_ j —dt 1 
~ logz ens ~f peg oe 173 
log x log = ee far) 
“h . Hog a t 9} “eg? a 5; 
= 00 + aie], ° Gogrtogtoes) # 


1 
_ ———_—___— } dt 
log log x + O(1) +f " ( log ¢ log log ) 


Now for some constant M, 


. 1 a dt 
——_———— } dt MS 
l (a) 4 < f t log t log log ¢ 


= O(log log log x), 


1 
so that > — = log log x + O(log log log x), 
pz 
which proves the theorem. 


*PROBLEM 


Use Theorems 6-15 and 6-16 to show that 
Zz 
| mt) ay = >. ag o(1) ~ log log z, 
2 ¢? pszPp 


and deduce that for no positive constant 6 is there a T = T'(6) such that for 
allt > T, 


t 
a(t) > (1 +6)—_» 
log t 
and that for no 5 > 0 is there a T = T(6) such that for allé> 7, 


r(t) < (1 O05 


6-7] THE ORDER OF 7(2) 105 
This implies that, if 
a(x) 


zo t/log x 


exists, it must be 1. 


6-7 The order of w(x). We now show that the actual order of 
r(x) is x/log x. 


THEOREM 6-17. There are positive finite constants c, and ce such 
that for x > 2, 


£ x 
Cy ——- < r(x) < 
log x log x 


Proof: Take n > 2. Corresponding to each p < 2n there is a 
unique integer r, such that p’? < 2n < p’?t!. We first prove that 


nin! mint | ve (5) 


pS2n 


n<pS2n 


The first part is obvious, since any prime between n and 2n occurs as 
a factor of (2n)! but does not occur in the denominator (n!)*. For 
the second part, we have that the highest power of » which divides 
the numerator (2n)!, by Theorem 6-9, has exponent 


™? | Qn 

2 Ea 
while the highest power of p which divides the denominator has 
exponent 


2 
so that the highest power of p dividing (*") has exponent 


Ellel-2les Zo 


Here we have used property (f) of [z], from the list in Section 6-3. 
From (5) we get 


nN 


n<pS2n -_ ps2n 


106 NUMBER-THEORETIC FUNCTIONS [coap. 6 
2n 
whence (x(2n) — r(n)) logn < ioe (* ") < x(2n) log 2n. 


2 
Clearly (*") < 2?” and also 


(7) = G0 On _ ttt Il 2 = 2": 


n Ll--:n pel. a a=) 

thus (r(2n) — x(n)) logn < 2n log 2, 

or* w(2n) — r(n) < €3 ; (6) 
log n 

and a(2n) log 2n > n log 2, 

n 

or a(2n) > C4 (7) 

log n 


If x > 4, we get from (7) that 


x [7/2] x 
> = ee a 
re) > +(2[5])> sogteal> ings 
and since r(x) > 1 for 2 < x < 4, r(x) > ce, (x/log x) for x > 2. 
If y > 4, we get from (6) that 


wy) — = (:) = ay) — 2 (A) See (2 B) = (By 


[y/2] Be, 
log [y/2] ~ “logy 


_Y_., 
log y 


Using the trivial bound z(y/2) < y/2, we get 


m(y) logy — (4 ) tog 2 = {r) = r(% \}ogy + #(2 2 10g 2 


< log y+ ep —— + © < cay. 
logy 2 


<1+¢3 


and so for y > 2, n(y) —f7 (4) <7 


*Here c3, c4, °-° will denote certain positive constants, whose exact values 
will be of no concern. 


6-7] THE ORDER OF 7(z) 107 


If we put y = 7/2” with 2” < x/2 and m > 0, this becomes 


2a x x x 
om lo 08 oa — gmt log Sti gmt+l < 8 5m’ 


and summing over all such m’s, we have 


x 
a(x) log x — aC 3 =) log —— oer < Co®, 


where 2" < x/2 < 2"*1. But 2/2#t! <2, so that a(x/2"t!) = 0, 
and we have 
a(x) log x < Cox, 


which completes the proof. 


THeEorEeM 6-18. There are positive constants co, Cio such that for 
ri; 
cor log r < pr < Cyor log r. 


Proof: Taking z to be p, in Theorem 6-17, we get 


Cy Pr <r<ce ate 
log Dp, log p, 


The right-hand inequality gives immediately 
Pr > cor log p, > cor log r. 
Using the other inequality and the fact that log u = o(V’ u), we have 
that for r > ¢11, 
log p, r log p; 
SP < < £P 


Cy 
Vy Pr 
Pr <r’, 


log p, < 2 logr, 


and so for r > cq, 


1 
Pr < -—-r-2logr, 
Cy 


whence finally p, < cyor log r for all r > 1. 


We can use Theorem 6-17 to improve Theorems 6-14 and 6-16. 
Examining the proof of Theorem 6-14, we see that the error term can 


108 NUMBER-THEORETIC FUNCTIONS [cHaP. 6 


now be reduced to O(n), since by Theorem 6-17, x(n) log n = O(n). 
Thus we have 


lo 
THEOREM 6-19. a os log x + O(1). 
psx 


Following the argument used for Theorem 6-14, we now have 


»; i oe Hoe + 011)) +f log t- 


psx P 


ee 
t log? 


log p ‘) dt 
— — logt 
+ (= Pp cs t log? t 
1 
-1+0(—— -) + log log 2 — log log 2 


10g Pp dt | ” O(1)dt 
— logt — . 
ae Lene ve Fer z tlog’t 
Here the first integral is convergent, and the second is clearly 
O(1/log x). This proves 


THEOREM 6-20. There is a constant C such that 


a * = log log 2 + 0 +0(——). 
log x 


pszP 


PROBLEM 


Apply Theorem 6-20 to show that for some constant B, 


E tog (1 - =) = —toglogs ~ B+ 0(— ). 
pz Pp log x 


Deduce that 
—B 
(ies )sc— 4 o( hi 
psx Pp log x log? x 


By Theorem 6-11, B > 0; although we do not prove it, B is Euler’s con- 
stant. [Use the law of the mean to show that if f(x) — 0 as x — ©, then 
ef = 1+ O(f(z)).] 


6-8 Bertrand’s conjecture. In 1845 J. Bertrand showed empiri- 
cally that there is a prime between n and 2n for all n greater than 1 
and less than six million, and predicted that this is true for all positive 
integral n. Chebyshev proved this in 1850, and indeed that for every 


6-8] BERTRAND’S CONJECTURE 109 


¢ > = there is a £ such that for every z > é there is a prime between 
x ad (1 + «)x. Since that time, analytic methods have been used 
to show that this last theorem is true for every « > 0. We shall con- 
tent ourselves here with a proof of Bertrand’s original conjecture. 
The proof given is due to P. Erdos. 

It is worth noting that Theorem 6-19 implies a weak form of the 
theorem. 


THEOREM 6-21. There exists a positive constant Cy. such that there 
as a prime between n and con for all n. 


Proof: By Theorem 6-19, there is a constant A such that 
logn —A < 5 ee a P<logn+A 
PS 
for all n. Suppose that there is no prime between n and ne?4. Then 
Z _ ~ ps nerd _ 


and so by Theorem 6-19 again, 
Phe P> log (ne?4) — A = logn+ A. 


With this seta tice the theorem is proved with cj. = e4. 
For the proof of the more exact theorem we need two lemmas. 


THEOREM 6-22. II p < 4”. 


psn 


Proof: We use induction on n. The theorem is obvious if n = 1 
or 2. Suppose it is true for 1, 2,...,n— 1, wheren > 3. Then we 
can restrict attention to odd n, since otherwise 


IIp= II p< 47" <4, 


psn psn-l 


so we can put n=2m-+1. From its definition, the binomial 
coefficient 


e + ') _ (2m +1)! 
m ~— mi(m + 1)! 


is divisible by every prime p withm +2<p<2m+1. Hence 
no ps(™**). p< **)anes 


ps2m+l m psmt+l 


110 NUMBER-THEORETIC FUNCTIONS [cHaP. 6 


But the numbers 
(°" + ') os + ” 
and 
m m+1 
are equal, and both occur in the expansion of (1 + 1)?”+!, so that 
(?" + ‘) < Lo g2m+1 _ 4m 
m ~~ : 


and so Il p <4". 47h { g2mtl, 
p<2m+1 


The theorem follows by induction on n. 
THEOREM 6-23. Ifn > 3 and gn < p < n, then pi(?) ; 


Proof: The restrictions on n and p= are such that 

(a) pis greater than 2, 

(b) p and 2p are the only multiples of p which are less than or 
equal to 2n, since 3p is greater than 2n, 

(c) p itself is the only multiple of p which is less than or equal to n. 

From (a) and (b), p?||(2n)!, and from (ec), p*||(n!)?, so that 
pt(2n)!/(n!)?. 


THEOREM 6-24. For any positive integer n, there is a prime p such 
thatn < p < 2n. 


Proof: This is true for n = 1, 2,3. Assume the theorem false for a 
certain integer n > 4. Then by Theorem 6-23, every prime which 


2 

divides [ must be less than or equal to 2n/3. Let p be such a 
2 

prime, and suppose that p*|| (*") - Then by the proof of Theorem 


( } 
Nr 


it follows that p* < 2n. Thusif a > 2, thenp < V 2n, and so there 
are at most [~/2n] primes appearing in the prime-power factorization 


2 
of (*") with exponent larger than 1. Hence 


<2n/3 


II pp’, = (p'®? < 2n < p’?*?) 


ps2n 


6-8] BERTRAND’S CONJECTURE 111 


2 
But (*") is the largest of the 2n + 1 terms in the expansion of 
(1 + 1)”, so that 
2 
w<@n+1(*"), 


and so 
nv 


< (2n)v™". . 
2n+1 (2n) ae 


By Theorem 6-22, this implies that 


n 
2n+1 

and since 2n + 1 < 4n?, this gives 
4” < (2n)V7"t2 .42nl8— op 8 (Qn)¥27 42, 


< (2n)¥?" . ras 


Taking logarithms, we have 


log 4 
Z — < (VW2n + 2) log 2n. 
This inequality is false for n > 512, so the theorem is true for 
n > 512. But in the sequence of primes 


2, 3, 5, 7, 13, 23, 43, 83, 163, 317, 557, 


each number is smaller than twice the one preceding it, and the 
theorem is also true for all nm < 512. It is therefore true for all n. 


PROBLEMS 


1. It follows from the Problem of Section 6-6 that m Theorem 6-17, 
C1<1<c». If estimates had been made of ci and ce in the proof of 
Theorem 6-17 (which would be simple to do), we would know, as a con- 
sequence, two particular constants c; and ce for which the inequality of 
Theorem 6-13 holds. Suppose that this is the case, and that c2/c; = B> 1. 
Show that if « > 0, then 


a((1 + )t) — r(z) 
z/log x 


>a(l +e-a) +0(4.), 


Deduce that if e > 6 — 1, the number of primes between z and (1 + e)z 
tends to infinity with z. 


112 NUMBER-THEORETIC FUNCTIONS [cHaP. 6 
2. Show that there is a constant A > 0 such that 
1 
>» ->A 
s<p<2? P 
for all sufficiently large x. Deduce that for each e > 0, there are infinitely 
many pairs p, and pp. of consecutive primes such that 


Pn+1 < (l + €)Dn- 


6-9 The order of magnitude of ¢, 0, and 7. - The quantity z(z) is 
reasonably well-behaved, and so one can make fairly precise state- 
ments about its size as a function of xz. This is not true of the other 
functions we have considered, which vary much too wildly to permit 
asymptotic approximations. There are, however, various weaker 
statements which can be made about their size which still yield con- 
siderable information. 

Consider, for example, the quantity 7(n). A moment’s thought 
shows that the number of divisors of 7 is much smaller than n itself, 
for large n; it is to be expected that z(n) = o(n). And whiler(n) = 2 
infinitely many times, it is also possible to make 7(n) arbitrarily large 
for suitable n. Thus if the points (n, 7(n)) are plotted in a coordinate 
system, as in Fig. 6-4, there is a unique ‘“‘lowest”’ polygonal path 
extending upward and to the right from (1, 1) which is concave down- 
ward and is such that every point (n,7(n)) lies on or below it. 
Suppose that this path is described by the equation y = T(z). 
While we shall not obtain an asymptotic estimate for T(x), the 
following theorem shows that it increases more rapidly than any 
power of log x, and less rapidly than any positive power of z. 


om Nw Rh OD AVI OC 


oe ee a en a a eC ee eS ee ee ey ee ae 
123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FiGurRE 6-4 


6-9] THE ORDER OF MAGNITUDE OF ¢, o, AND 7 113 


THEOREM 6-25. (a) The relation 7(n) =O (log’n) is false 
jor every constant h. 
(b) The relation r(n) = O(n*) is true for every fixed 5 > 0. 


Proof: (a) Let n be any of the numbers (2-3--- p,)”, 
m=1,2,...; here7 is arbitrary but fixed. Then 


r(n) = IL (m +1) = (m+ 1)" > m’. 


log n 
But m = ——————_~ that 
me Tog (2-3 ++ = B,)? 
log” n 7 
t(n) > > €43 log’ n, 


(log (2-3---9p,))* 


where c;3 > 0 is a constant depending only on r, and not on n. 


(b) Let fn) =, 


then f is multiplicative. But 


PE m+1 . 2m 2 logp™ ~ 2 _log p™ 
f(p ) ty ey ie ne? ms 
p p logp p log2 p 


so that f(p”) — 0 as p” — ~,i.e., as either p or m, or both, increases. 
This clearly implies that f(n) — 0 as n — ©, which is the assertion. 
Alternative proof of (b): Let 5 be positive, and let 


rn= II pi**. 
1 
Then ™%)—-%tl a+tleog max (2+) 
pi% | pln x20 pit 
For fixed 5, the quantity 
ene 
2>0 p= 


is equal to 1 for sufficiently large p, and is never smaller than 1. Hence 


ae < I] max (3) = 68, 


p «20 


and c; is a finite constant independent of n. Hence r(n) = O(n’). 


114 NUMBER-THEORETIC FUNCTIONS [cHaP. 6 


By actually evaluating the constant cs, one obtains inequalities 
such as 


r(n) < V3n, r(n)<4Wn,..., 


which of course are very poor estimates for large n, but which are 
sometimes more useful than the statement of Theorem 6—25, where 
nothing is said about the behavior of 7(7) for all n, but only for large n. 

As regards the ¢-function, we have the trivial upper bound 
y(n) <n — lforn > 1, equality being attained whenever n is prime. 
The corresponding lower bound is less obvious. 


THEOREM 6-26. There is a positwe constant cy, such that for all 
n> 3, 


C14n 
aye log log n 
Proof: We have 
1 
e(n) _ H(1 -*), 
n pin Pp 
so that 
1 1 
log a = > log (1 - =) = -E 7+ ¥ flog (1 -*\+4 
n pin P pin P pin Pp P 
1 
SS) = eg, 
p\n 
since 


E flor(1-2) +3} >E (5-524) 
>-F(1 da 


Now let pi, ..., Pr—p be the primes less than log n which divide n, 
so that 


1 om r 
D-=v-+t DL —=8,4+ Se. 


pin P k=1DPk k=r—p+1 Dk 
Then 


log? n < p*r_p4i < U Pe < Nn, 
k=r—p+1 


6-9] THE ORDER OF MAGNITUDE OF 9, 0, AND + 115 


] 
so that p< i SL ) 
log log n 
1 ] 
and = C16- 


< fe ee 
on log n log log n 
By Theorem 6-20, 


S; < log log pp_p + ¢17 < log log log n + ¢y7. 


Combining these results, we get 


log oe > —log log log n — cs, 
and so 
g(n) C4 
n log log n 


We can use Theorem 6-26 to obtain a corresponding upper bound 
for o(n), with the help of the following theorem. 


THEOREM 6-27. There is a positive constant c19 such that 


UIA Z 


Cig < 1. 


Proof: If n = Ip, then 


atl oy 1 
n(2—")-E-5) 
pin\ p— I pin p 

a —(a+1) 1 


pi|n 1 — 1/p pin 
=v IT - pr). 


pi|r 


a(n)o(n) 


Here the coefficient of n? is clearly less than 1 and greater than or 


equal to 
1 . 1 
pi|n p k= k? 


Now 


116 NUMBER-THEORETIC FUNCTIONS [cHap. 6 


and for x > 0, log (1 — x) > 


l—x 


nr 


1 1 
so that _ Sp < Eme(1- 8): 


Since the first sum in this inequality tends to a limit as n > ~, it 
follows that the above coefficient of n? is bounded away from zero, 
and the theorem follows. 
THEOREM 6-28. a(n) = O(n log log n). 
Proof: By Theorem 6-26, 
y(n) C14 
rr AP ee 
n log log n 
and by Theorem 6-27, 
] 
a(n) 2 n 2 og log n 
n y(n) C14 


at = O (log log n), 


a(n) = O(n log log n). 


PROBLEM 
Show that there is an infinite sequence of positive integers 71, no, ... such 
that 
cn 
o(m) <———>» k=1,2,..., 
log log nm, 


for some constant c. 


6-10 Average order of magnitude. Another way of describing the 
behavior of a number-theoretic function is in terms of its average 
order, that is, in terms of the quantity 


* > fm). 
nN m=1 


Summing the values of a function has the effect of smoothing out its 
irregularities, so that it is frequently possible to make quite precise 
statements about the size of the sum. 


6-10] AVERAGE ORDER OF MAGNITUDE 117 
THEOREM 6-29. If 
F(n) = J (d), 
then 


5 Fm) => |= | so, 
m=1 m=1 Lm 
Proof: 
n n n [n/d] | n n 
E Fm) = © EI@=E Tso =E | 2s. 
m=1 m=1 d|m d=1 k=1 d=1 
THEOREM 6-30. If 


then 


n t n n/t n Tn 
Erm = ¥ [Rm + E o(2) -[F] oo. 
where ¢ is any positive integer not exceeding n, and 
g 
G®) = GC) = & s(n). 


Proof: By the definition of G, f(m) = G(m) — G@(m — 1), and so 
by partial summation (cf. the proof of Theorem 6-15), 


E Fm) = ¥ [2 |r + = [2 |r 
= & [2 |rm+ 5 [2] Gom - em - 0) 


Ela} + E10 -(]- Leal) 
[Few 


As was noted earlier, [n/m] — [n/(m + 1)] is the number of integers 
u such that 


ao eer 


118 NUMBER-THEORETIC FUNCTIONS (crap. 6 


For each such u, n/u — 1<m< n/u, so that m = [n/u]. Hence 


“0 (A 7 Fe |) 7 eee (") 
Eo (| Laka) ~ Z sea Zacum (a) 


and the proof is complete. 


THEOREM 6-31. o 7(m) = nlogn + (2y — 1)n + O(n3), 


m=l1 


where y 1s Euler’s constant. 


Proof: Take F=7, f=1, t= [Vv n] in Theorem 6-30; then 
G(é) = [&] and 


x, ee ba fel : z ar Fea 1 — 


=25 [=|-n+ ova) 
m=1 L™ 
-23* + O(Vn) — 1 + O(Vn) 


Qn (log Vn + y + O(1/Vn)) — n + O(Vn) 
nlogn + n(2y — 1) + O(Vn). 


The term O(Vn) in Theorem 6-31 is not the best possible estimate 
of the error. The problem of increasing the accuracy of the estimate, 
usually called Dirichlet’s divisor problem, has received a large amount 
of study. It is known that O(n?) can be replaced by O(n), but not 
by O(n*). The exact exponent, if such exists, is still unknown. 

For the purpose of illustrating the methods available for estimating 
averages, we give a second proof of Theorem 6-31. By Theorem 6-29, 


Erm - = [5], 


6-10] AVERAGE ORDER OF MAGNITUDE 119 


FIGURE 6-5 


Geometrically, this last sum represents the number of lattice points 
(x, y) (that is, points such that z and y are integers) with positive 
coordinates, on or below the hyperbola zy =n, since for fixed 
x the number of integers y such that 1 < y < n/z is exactly [n/z]. 
By symmetry, the number of lattice points (x, y) withO < zy < n, 
y > Z, 1s equal to the number with 0 < zy < n, y < x (see Fig. 6-5). 
Hence the number of points (x, y) with 0 < zy < nis twice the num- 
ber of those with y > x, plus the number with y = z: 
n Vn n ten 
E rm) =2¥ ([2]-2) +tval 


m=1 z=] x 


= an — + O(a) = aa + O(n) 
1 

Qn (log Vn + 7 + O(1/Vn)) — n + O(Vn) 

nlogn + (2y —1)n+ O(Vn). 


To get an asymptotic estimate for the sum of the first n values of 
the ¢-function, we need a preliminary result concerning the famous 
Riemann {-function, which is defined for s > 1 by the equation 


re) = DS 


For s < 1 the function is, for the time being, undefined, since the 
series fails to converge for such s. It is a well-known result, which we 
shall use without proof (see Problem 6 below), that 
ry 1 2 
(2) =2 a= 
n=1 71 


us 


120 NUMBER-THEORETIC FUNCTIONS [cuap. 6 
THEOREM 6-32. Fors > 1, 
1 = wn) | 
c(s) =1 nv 
Proof: The series of the theorem and that for ¢(s) converge ab- 
solutely for s > 1, so that they may be multiplied together by adding 
ali possible products of a term from one series and a term from the 


other, and the resulting terms may be arranged in any convenient 
order. Hence 


- 1 lem Sf wm Sl _ 
x m* 2 n ~ Po (mn)* 7 RR t dit u(@) : 
nN 2 
THEOREM 6-33. >> o(m) = aid + O(n log n). 
m=1 us 
Proof: Since 
- u(d) 
em) 7 he me ad 
we have 
: u(d) ’ - n/a 
Eom) = mE = y de@) = Cu@ y a 
m=1 m=1 dm dd’<n @=1 @’=1 
2 n/dP +in/d 12 ny 
= Fw Oe = 5 Ew |5| 
* [n 
+0(=[3]) 
1 2 2 n 
ae 2 p(d) a +O (= “) + O(n log n) 
2 


= (5 u@)_ S. HY + O(n logm 
a= 


1 i d=n-+1 d* 


ae | 
art Xp + O(n log n) 


6-10] AVERAGE ORDER OF MAGNITUDE 121 
THEOREM 6-34. Pa a(m) = win O(n log n). 
Proof 
> a(m) = S d = > Sa = > ({=] +[2]) 
m=1 z=} m=1d=1 2 m=i \LM Ue 


oe 


bo 


== (@) — 0(1/n)) + O(n log n) 
2 
=* wo) + O(n log n). 
PROBLEMS 


nsx Nn 


rems 6-15 and 6-31.] 
2. Let 6(n) be the largest odd divisor of n. Show that 


A = + (1). 


n 2 
1. Show that > 7% _ “es + 2ylogz + O(1). [Hint: Use Theo- 


2 
D 5(n) = — + O(z) and > 
nSz 3 nxax 
[Hint: Classify the numbers less than or equal to z according to the ex- 
ponents of the powers of 2 sacs and show that 
(x —1)/2 


(x —4)/8 
Din)= LD (On +1) +. > Man +2 > A a OR 


nix n=0 2 nd 4 
¥#3. Show that > OM) 
nxx n qr? 


Deduce that the numbers ¢(n)/n are not uniformly distributed in the inter- 
val (0,1). [A sequence {a,} of numbers in (0, 1) is said to be uniformly 
distributed if, for every a and 8 with O <a < B< 1, 


lim — )} 1=£,B-a] 
N 


aan <8 
4. Prove that > ¢(d) 4 AC ae 
d=1 d 2 


(Use Theorem 6-29.) 


122 NUMBER-THEORETIC FUNCTIONS [cHap. 6 
*5. Using the result of Problem 1, Section 6-2, show that 
Se log H ay wee ) 
n<zY (n) no (n) 


where the accent designates summation over the square-free integers. 
*§. Prove that ¢(2) = 77/6 by evaluating the double integral 


r=['[ dx dy 
o Jo 1 —xzy 


in two ways. [Hint: Obtain J = £(2) from the expansion 


1 
1-—z 


irs ene Wy ean tn 


which is valid for |zy| < 1. Then evaluate the integral directly by rotating 
the coordinate system about the origin through 45° to obtain 


1/V/2 u V2 V2—u 
faa au [ a. ee ee ae ae 
0 09 2—u?+ vr? 1/V2 0 2—u?+ v? 


integrating with respect to v, and making the substitution u = V2 cos 0] 


6-11 An application. As was pointed out earlier, it is not known 
whether 2 is a primitive root of infinitely many primes, nor has the 
same question been settled for any other fixed integer. It is there- 
fore natural to ask, what can be said about the size of the smallest 
primitive root gp, as a function of p? Unfortunately, the little that is 
known (such as the theorem that g, is less than Vp log!” p for large p) 
cannot be developed here; we content ourselves with an estimate 
for the smallest quadratic nonresidue n, of p. Since gp is certainly 
a nonresidue of p, any upper bound for g, would imply the same 
bound for n,, but not conversely. 


THEOREM 6-35. There zs a quadratic nonresidue of p between 1 and 
Vn, for all sufficiently large primes p. 


Proof: Corresponding to each pair of integers x, y with (z, y) = 1, 
O<r<v D, 0O<y¥<v D, there corresponds an integer z, unique 
modulo p, such that 

x = yz (mod p). (8) 


6-11] AN APPLICATION 123 
Different pairs x, y yield different z’s. For if 


Y1 = yz (mod =) and 2 = Yor (mod p), 
then 


Z1Y2 = Xey1 (mod p), 


whence 21¥2 = Zey;. But this, together with the hypothesis that 
(21, y1) = (2, ye) = 1, clearly implies that 2; = rq and y; = ye. 

Now there are 29(m) ordered pairs x, y of relatively prime positive 
integers whose largest element ism, if 1 <m < Vp , and one pair both 
of whose elements are 1, so that the total number of pairs is 


Vp VP 
1+2 2, o(m) = 2 2, elm) — 1. 


If this number is larger than p/2, then there are more than (p — 1)/2 
different residue classes z, and since there are only (p — 1)/2 quad- 
ratic residues of p, at least one z must be a nonresidue. But then it 
follows from (8) that one of x and y (each of which is smaller than 
vp) is also a quadratic nonresidue of p. Thus the proof will be com- 
plete when it is shown that, for all large p, 
Vp p 
22 e(m)-—1>5- (9) 


m=1 2 


Using the estimate of Theorem 6-33, we have that 


VP = 
2X om) — 1 = 5 IVP + O(V > log p) 


6 6 
= 5p + O(Vp log p) ~~5- 


Since 6/x” > 4, it is clear that (9) holds for sufficiently large p, where 
the lower bound of validity depends upon the implied constant in the 
term O(V'p log p). 

By refining the argument slightly, it can be shown that the error 
term in Theorem 6-33 is numerically less than 1-nlogn. Using this, 
the phrase “‘sufficiently large p’’ of Theorem 6-35 can be replaced by 
“ny > 10%”, and finally, by reference to tables of the smallest primitive 


roots of primes less than 10%, it can be shown that n, < Vp for every 
p # 2, 3, 7, 23. 


124 NUMBER-THEORETIC FUNCTIONS [cHaP. 6 


REFERENCES 


Section 6-1 


J. J. Sylvester (Mathematical Papers, vol. 4, New York: Cambridge 
University Press, 1912, pp. 588, 625-629) proved that an odd perfect 
number must have at least five distinct prime factors, and conjectured 
that it must have at least six. His conjecture was verified by U. Kthnel, 
Mathematische Zeitschrift (Berlin) 52, 202-211 (1949). It follows that if 
there is an odd perfect number, it cannot be smaller than 2.2-10!. 

For a complete account of the Mersenne numbers, see R. C. Archibald, 
Scripta Mathematica 3, 112-119 (1935), and H. S. Uhler, Scripta Mathe- 
matica 18, 122-131, (1952). 


Section 6-8 


The proof of Bertrand’s conjecture given here is a modification of that 
given by P. Erdés, Acta Universitatis Szegediensis (Szeged, Hungary) 
5, 194-198 (1932). The proof of Theorem 6—22 is simpler than that orig- 
inally given; it was found independently by Erdés and L. Kalmar in 1939, 
but was not published. 


Section 6-11 


The inequality g, < Vp log!’p is due to Erdés, Bulletin of the American 
Mathematical Society 61, 131-132 (1945). The inequality 


n 3n? 
~ v(m) - = < nlogn 
m=1 wv 


is due to R. Tambs-Lyche, Kongelige Norske Videnskabers Selskabs For- 
handlinger (Trondhjem, Norway) 9, 58-61 (1936). For further results on 
this error term, see Erdés and H. N. Shapiro, Canadian Journal of Mathe- 
matics 3, 375-385 (1951), and A. Z. Valfish, Akademiya Nauk Gruzinskoi 
SSR Trudy Tbilisskogo Matematicheskogo Instituta iment A. N. Razmadze 
19, 1-31 (1953). [American Mathematical Society Translations, Series 2, 
4, 1-30.] 


CHAPTER 7 


SUMS OF SQUARES 


7~1 An approximation theorem. In this chapter we consider the 
following questions: Given k, what integers can be represented as a 
sum of k squares? If an integer is so representable, how many repre- 
sentations are there? Both problems will be completely solved for 
k = 2, a partial answer to the first will be given for k = 3, and it will 
be shown that every integer is a sum of four squares (and hence of 
k squares, if k > 4). 

We shall need the following approximation theorem. 


THEOREM 7-1. If ts a real number and t is a positive integer, there 
are integers x and y such that 


&— 


x 1 
=) l<y<ét. 
y|~ yt@+1) 


Proof: The é-+ 1 numbers 
0-& — (0- él, 1-&— {1- &, coy té — [te] 


all lie in the interval 0 < u <1. Call them, in increasing order of 
magnitude, ag, a1,...,a;. Mark the numbers ag, . ... , a; on a circle 
of unit circumference, that is, a unit interval on which 0 and 1 are 
identified. Then the ¢ + 1 differences 


a1 — a, a2 — a4, ce ey Op — O41, ag —az +l 


are the lengths of the ares of the circle between successive a’s, and so 
they are non-negative and 


(ay — 0) + (ag — a1) +--+ + (1 — aw) = 1. 


It follows that at least one of these ¢ + 1 differences does not exceed 
(¢+1)~'. But each difference is of the form 


m& — got — N, 


where NV is an integer, and we can take y = |g: — gol, 7 = EN. 
125 


126 SUMS OF SQUARES [cHaP. 7 


7-2 Sums of two squares. A representation of the positive integer 
n as a sum of two squares, say n = x” + y?, will be termed proper or 
improper according as (x,y) = 1 or (4,y) > 1. Throughout this 
section, “‘representable’”’ will mean ‘‘representable as a sum of two 
squares,’ with an analogous meaning for “‘properly representable.” 


THEoREM 7-2. If p = 3 (mod4) and pln, then n has no proper 
representation. 


Proof: If pin, n = 2? + y’, and (x,y) = 1, then ptr and pty. 
Hence there is an integer u such that y = ux (mod p), and 
et+y=24+ us? = 272 (1 + uv’) = 0 (mod p), 
so that 
u? = —1 (mod p). 


It follows that —1 is a quadratic residue of p, and so either p = 2 or 
p = 1 (mod 4), by Theorem 5-3. 


THEOREM 7-3. An integer n = Ip,‘ is representable if and only 
af a; 1s even for every zt for which p; = 3 (mod 4). 


Proof: Suppose first that p?**4||n, where p = 3 (mod 4), and sup- 
pose that n = 2? + y?, where (x,y) = d and p’||d. Then x = dz 
and y = dyi, where (21,41) =1. But if 2,?+ y,;7 =N, then 
p71! N, and 2k — 27 + 1> 0; this contradicts Theorem 7-2. 

It remains only to show that if n = no”, where n; is square-free 
and without divisors congruent to 3 (mod 4), then 7 is representable. 
It suffices to prove n; representable. Since 


(01? + yy?) (to? + yo”) = (ate + yrye)? + (1y2 — reyi)*, (1) 


the product of representable numbers is representable; this, together 
with the fact that 1 = 17 + 0? is representable, shows that we need 
only consider the various prime factors of n;. Since 2 = 1? + 1? is 
representable, it suffices to show that if p = 1 (mod 4), then 7p is 
representable. For later purposes, however, we prove a more precise 
result. 


TuHeorEeM 7-4. If n> 1 and u? = —1 (mod n), there are unique 
integers x and y such that 


n=2°+y*, «>0, y>Od (,y) =1, 


y = uz (mod n). 


7-2] SUMS OF TWO SQUARES 127 


Remark: In case n is a prime p = 1 (mod 4), the congruence 


u” = —1 (mod 7) is solvable, and so p is representable. 


Proof: The idea of the proof is to replace the equation x” + y? = n 
by the equivalent conditions 
2+y?=O(modn), O<2?+y? < 2Qn. 
To satisfy these conditions, we require that x be one of the numbers 
| yee [Vn], and then seek a y such that y = ux (mod 7) (so that 
x2 +y? =0(modn)) andl<y<~<n. But if y = uz (modn), 
then y = ux + an, and so we want a linear combination of u and n 


to be small. 
We apply Theorem 7-1, with 


fE=_—-, t=[Val, 
nr 


and see that there are integers a and z; such that 1 < 2 < [Vn] and 


| “ua 1 21 
eee — Me Ree eee re  D 
nm M4] a(lt[Vn]) avn 
so that 
lux, + nal < Vn. 
Put y: = uxy + na. Ify, > 0, puty’ = y1,2’ = 2: Ify, < 0, then 
—y, = —uz, (mod n), and since u? = —1 (mod n) we have 
uy, = —uz, (mod n), 
u(—yi) = 2 (mod n), 
and we take z’ = —y,, y’ = 2. In either case, 


y’ =ur' (modn), «?+y2=n, 2 >0, y>O. 
From the relation 


n= 2y2 ey? = ay? + ux? + Quxyna + 17a? 
= 2,7(1 + u*) + uxzjan + na(uz, + an), 


we obtain 


1 2 
1=( 2 n+ ua) 21 + ay, 


n 


so that (x1, y1) = 1, whence (z’, y’) = 1. 


128 SUMS OF SQUARES {[cHap. 7 


Finally, to prove the uniqueness, suppose that besides x’, y’, there 
is a pair x’’, y’’ satisfying all conditions of the theorem. Then by 
equation (1), 


py eee (a:!2 4 y!?) (a! 4 y!) = (27! 0!" a3 yy")? ie (2’y"” = ly’), 
But 


ost ee A 2 / vi Avr 


ae eyly”! sala”! + wPa’e” = 2'2'"(1 + u?) = 0 (mod n), 
and since 2’2/’ + yy’ >0, it follows that 2’2’’ +y’y’”’ =n, 


a’y’’ — a!'y’ =0. Hence 2” = ka’, y’’ = ky’, and it is clear that 


k= 1. 


THEOREM 7-5. The number P2(n) of proper representations of n is 


four times the number of solutions of the congruence u~ = —1 (mod n). 
Hence (by Theorems 5-1 and 5-2), 
0 af 4|n or if some p = 3 (mod 4) divides n; 


Po(n) = 32°*? af 44n, no p = 3 (mod 4) divides n, and s is the 
number of distinct odd prime divisors of n. 


Proof: The theorem is trivialifn = 1. Ifn > 1, then zy + 0, and 
the number of representations is four times the number of positive 
representations. To each u such that uv? = —1 (mod 7n), there cor- 
responds exactly one proper representation with + >0, y > 0, 
y= uz (modn). Conversely, if 2? +y? = n and (2, y) = 1, then 
(z,n) = 1, so that the congruence y = uz (mod n) has a unique 
solution (mod 7), and 


zg? + y? = 27(1 + u*) = 0 (mod n), 
which implies that u? = —1 (mod 7). 


COROLLARY. A prime p = 1 (mod 4) can be represented uniquely 
(up to order and sign) as a sum of two squares. 


This follows immediately from the theorem, for in this case P2(p) 
is 8, so that p has essentially only one proper representation. It 
clearly has no improper representation. 


PROBLEM 


Show that if n is a positive odd number of which —2 is a quadratic 
residue, then there are integers x and y such that 2x? + y? = n and 
(z,y) = 1. 


7-3] THE GAUSSIAN INTEGERS 129 


7-3 The Gaussian integers. In order to obtain an expression for 
the total number of representations of an integer as a sum of two 
squares, we turn our attention momentarily to the arithmetic of the 
so-called Gaussian integers: the complex numbers a + bi, where a 
and b are ordinary (or rational) integers. In this section, Greek 
letters will be used exclusively to designate Gaussian integers, and 
the set of all such integers will be denoted by R[7]. 

It is clear that if a and 6 are in R[z], then so also are a + 6 and af. 
Ifa =a-+ bi, then (a + bi)(a — bt) = a? + D? is called the norm 
of a, and designated by Na. It is easily verified that NaNé = Naf. 

An integer a whose reciprocal is also an integer is called a unié; 
since 


ais a unit if and only if 
(7+ b?)la and (a? + 67)|d. 


Ifa ~ Oandb = 0, then a? + b? > max (al, |b|); hence either a or b 
must be zero. If a = 0, then b7|b, whence b = +1. If b = 0, then 
a= +1. Thus the units are +1, +2. The numbers +a and +a 
are called the associates of a. An integer is a unit if and only if its 
norm js 1. 

We say that a divides B, and write a|@, if there is an integer y such 
that 8 = ay. If a|8, then Na|Ng. A unit divides any integer; if an 
integer has no divisors other than its associates and the units, it is 
said to be prime. Thus 1 + 7 is prime, since the equation 


1+2= (a+ bt)(c4+ dt) 

implies 
Nil +72) =2=N(a+ 0s)N(c + dz), 
which shows that either N(a + bz) or N(c + dz) is 1, so that either 
a+ b:orc+ diisaunit. More generally, this argument shows that 
if Na is a rational prime, then a is a prime of R[z]. Thus, correspond- 
ing to the representation p=27+y" of a rational prime p=1 (mod 4), 
we have the decomposition 
p= (& + ty) (x — ty) 


into primes of R{z]. In this case the factors are not associated: x and 


130 SUMS OF SQUARES [cHap. 7 


y are relatively prime and numerically larger than 1, and are there- 
fore distinct, so that the supposition that a relation of the form 


(x+y) =2—y 
holds, yields 


m=1 and @tt= —7, 


which is impossible. 

The primes p = 3 (mod 4) do not split further in R[z]; that is, they 

are also prime in the larger set. For if 

p = (a+ bi)(c + d), 
then 

p non (a? + b?) (c? + d?). 
But the only factorizations of p* are p+ p and 1 - p’, and it is impossi- 
ble that a? + b? = c? + d? = p, by Theorem 7-3; hence one of the 
numbers a + 02, c + dz is a unit. 

If a is not prime, it can be represented as a product of primes. For 
then a = By, where NB > land Ny > 1, and consequently NB < Na, 
Ny < Na. If 8 and y are primes, we are through; if one is not, it 
ean be factored, with the factors having still smaller norms. The 
process cannot continue indefinitely, since the norms are strictly 
decreasing positive rational integers, and so we come eventually to a 
prime factorization. 

To show the uniqueness of this factorization, we use the following 
analog of Theorem 1-1. 


THEOREM 7-6. If a and B are integers of R{x], and B + 0, then there 
are integers p, x such that 


a= Bx + p, Np < NB. 
Proof: Since B ¥ 0, we can write 


a ath (a + bt) (c — dt) 


where A and B are rational numbers. Let x be the nearest integer to 
A, and y the nearest integer to B, so that 


|A —2| <5, 
IB —y| <3. 


7-4) THE TOTAL NUMBER OF REPRESENTATIONS 131 
Then 


= @ +6 = |(A —2z) + (B-—y) 
= (A — 2)? + (B-y)?)? < @ +4)? <1. 
Hence if we put 


a+ iy = x, a — B(x + ty) =p, 
then 
ae 


Np = N(a — x6) = Ne-N (2 


and x, p< R[{z]. The proof is complete. 


x) < NB, 


Starting from Theorem 7-6, the development in Chapter 2 leading 
to the Unique Factorization Theorem for rational integers can now be 
paralleled, and we obtain a Unique Factorization Theorem for R[z]. 


THEOREM 7-7. Every integer of R[z] can be represented as a product 
of primes. This representation 1s unique, aside from the order of the 
factors and the presence of a set of units whose product zs 1. 


It can be shown that the primes of R[z] are exactly the ones we have 
already found, i.e., the associates of the following numbers: 

(a) 1 +z, 

(b) a + bt, where a? + 0? = p = 1 (mod 4), 

(c) g, where g = 3 (mod 4). 


PROBLEMS 


1. Use the ideas of this section to give a new proof of the Corollary to 
Theorem 7-5. 

2. Find the ccp (in R[t]) of 21 + 492 and 78 + 8. 

3. Show that if a = a + 07 is prime in R[z], then either (a, b) = 1 or 
ab = 0. Use this to deduce that the only primes in R[t] are those listed 
above. [Hint: If (a,b) = 1, show that Na = 2'p,...p,, where ¢ = 0 or 
land p; = 1 (mod 4) fori = 1, 2,...,7. Note also that Na = Na@.] 


7~4 The total number of representations. Suppose that n has the 
factorization 


43 
nN = J he ° Pj ¥] e Fae 
pj =1(mod 4) qj =3(mod 4) 
/ 34 
Put Il pwv=n, Il gf =m. 
pj =1(mod 4) gj =3(mod 4) 


132 SUMS OF SQUARES. [cHaP. 7 


THEOREM 7-8. If n > 1, then the number ro(n) of representations 
of nas a sum of two squares is zero if m is not a square, and is 4r(n’) 
uf m is a square. 

Proof: The case in which m is not a square is covered by Theorem 
7-3. If mis a square, each s; is even, and we can put s; = 2r;._ In 
this case we shall prove the theorem by establishing, by means of the 
identity 2? + y? = (x + ty)(x — zy), a one-to-one correspondence 
between the various representations of n on the one hand, and the 
factorizations of n as a product of two conjugate Gaussian integers, 
on the other. We must count these factorizations. Since 1 +7 = 
i(1 — 7), and 2 = i(1 — 1)”, we can write the prime decomposition 
of n in R[2] in the form 


n=ev1— i)*II((a + b1)(a — bi)) TI”, 


where the subscripts in the products have been omitted for clarity, 
and where 
a>0, b>0, p=a’?+b. 


Then every divisor of n in R[z] is of the form 
atiy= (1 —id“ITT((a + bi)4(a — b2)2)TT 9%, 
where 
O<sv<cy, O< um < 2u, Ost <b, OS& Sh, 
0< 7, < 2r. 


Not every such divisor leads to a representation; we must also 
require that the complex conjugate, 


x — ty = (—7)(1 + 2411 (a + bi) 2(a — be) 4) Tg 

= (1 — 2411 ((@ + b2)2(a — bi) 4) II q%, 
be such that (x + zy)(« — zy) = n. It is clear that this is the case 
if and only if uy = u, 4, + te = t, 71, = r. Since the powers of 7 are 


periodic, with period 4, we obtain all the distinct factorizations of n 
into conjugate factors by listing the numbers 


(1 — i)*IT((@ + bi) (a — bi)*4) Ty, 


where wu, ¢, and r are fixed, v is one of the integers 0, 1, 2, 3, and ¢, is 
one of 0, 1,..., ¢ Their total number is 411 (¢ + 1) = 4r(n’). 


7-6] SUMS OF FOUR SQUARES 133 


PROBLEM 


n = 
Show that > re(m) = rn + OV). 
pee 

(Hint: The sum on the left is the number of lattice points inside or on the 
circle x? + y? =n. Associate each such point with the unit square of 
which it is the lower left corner. The resulting region has a polygonal 
boundary, no point of which is at distance greater than 1/2 from the 
circle. | 


7-5 Sums of three squares. The problem of the solvability of the 
equation a ere (2) 


is much more difficult than the corresponding question for the sum of 
either two or four squares. The result is this: (2) is solvable if and 
only if n is not of the form 4'(8k + 7). We prove here only the 
trivial half of this theorem, that if n is of the specified form then (2) 
has no integral solutions. 

Since a square can have only the values 0, 1, or 4 (mod 8), the sum 
of three squares is congruent to 0, 1, 2, 3, 4, 5, or 6 (mod 8), so that 
no n = 7 (mod 8) is so representable. If 4|n and (2) holds, then 
z, y, and z must all be even, so that n/4 must also be a sum of three 
squares. Therefore n cannot be a power of 4 times a nonrepresentable 
number. 

It might be mentioned that one reason that problems concerning 
three squares are more difficult than those concerning either two or 
four is that there is no composition identity in this case analogous to 
(1) or to that given below for four squares. Indeed, the fact that 3 
-and 5 are sums of three squares, while 15 is not, shows that no such 
identity -is possible. 


7-6 Sums of four squares 


THEOREM 7-9. Every positive integer can be represented as a sum of 
four squares. 
Since 

(ay? + ao” + 2g” + 24?) (yr? + yo? + ys? + y4”) 
= (1y1 + xeye + xgy3 + rays)? + (2iye2 — oyr + rye — Tay3)” 
+ (r1y3 — r3y1 + Layo — Leys)? + (Uys — Tay + Toyz — TzY2)?, 


134 SUMS OF SQUARES [cHap. 7 


the product of representable numbers is representable. Since 1 is also 
representable, it suffices to prove that every prime 7 is representable. 
The proof, which uses the same idea as the proof of Theorem 7-4, 
depends on the following theorem. 


THEOREM 7-10. Let r, s, and m be positive integers with r < s, and 


let \, (o = 1,..., 8) be positive numbers (not necessarily integers) 
smaller than m, such that 


Al oe e As > m’. 
Then the system of r linear congruences 
§ 

>) Ape%e = 0 (mod m), | ah PEN oF 

o=] 
where the a’s are integers, has a solution in integers 21, ..., 2s, not 
all zero, such that |x,| < foro =1,...,8. . 
Proof: Put 


Y> = Li Vooter forp =1,...,7. 
For each o, let x, range over the integers 0,1,..., [Az]; this gives 
1 + [d,] choices for x which are distinct (mod m), since \, < m, and 
there are 


I (1 + [rel) 


different s-tuples x;,...,2,. Corresponding to each s-tuple 21, ..., 2s 
there is an r-tuple y;, ..., y,, and so we have found 


I (1 + Pel) > i de > m? 


r-tuples y1,..., Yr. But there are only m’ integral r-tuples which 
are distinct (mod m), so that there must be sets y’,...,y, and 
yi’,...,Yr such that y,’ =y,’ (mod m) for p=1,...,r._ If 
these r-tuples correspond to 2;’,...,2s and 2;’’,...,a,. respec- 
tively, then 


J 
Yp — Yo = Lu Apo (to — Zo) = 0 (mod m), p=l1,...,7, 


and not all of z,’—z,’’ are zero, while |z,’—2,’'|<\, foro=1,...,8. 


7-6] SUMS OF FOUR SQUARES 135 
Proof of Theorem 7-9: If p is a prime, then the congruence 
a? + y? + 1 =0 (mod p) 


has a solution. For if x and y range independently over the numbers 
0,1,...,(-—1)/2 (this for odd p; the assertion is clearly correct 
for p = 2), then all the numbers 2” are distinct (mod p), and the 
same is true of the numbers — (1 + y*). For if 2,7 = x;? (mod p), 
then p| (x; — x;)(2; + 2;). ButO < 2;+ 2; < p, unlessz; = xz; = 0, 
so pl\(z; — x;), 2; = 2; (mod p), and so z;=2;. But we have 
altogether 
pt+l pti 
9 - 2? +1 

numbers x” and —1 — y”, so some 2” is congruent to some —1 — y’, 
modulo p, which is the assertion. 

Suppose that a? + b? ++ 1=0 (mod). By Theorem 7-10, the 
congruences 


x = az-+ bt (mod p), 


bz — at (mod p:p) 


y 
have a nontrivial solution z, y, z, ¢ with 
max ({z|, lyl, lzl, lél) < Vp + ¢; 


here r = 2, s = 4, m = p, and we have chosen A, = Vv p + e, where 
é > 0 is so small that Vp +e<p. Nowz, y, z, and ¢ are integers, 


while Vp is not; if € is chosen so small that Vv p te<1l+[v vl, it 
follows that 


max (|z|, ul, lal, él) < Vp. 
We have 
a2 + y? 
while 


(a? + b?) (2? + t”) = —(@* + #) (mod p), 


O<xr4+yY+24+? <ptptptp= 4, 
so that 
ety +2+ = Ap, 


where A = 1,2, or3. If A = 1, weare finished. If A = 2, then zis 
congruent to y, z, oré (mod 2). Ifx = y (mod 2), thenz = ¢ (mod 2), 


136 SUMS OF SQUARES [cHAP. 7 


Sew sv he see ye) 
and p= ( 5 ) + , + 3 + 5 ; 
where the quantities in parentheses are integers. 
In the case A = 3, we note first that p = 3 has a representation: 


3 = 17+ 1? + 1?, so that we need only consider p ¥ 3. The square 
of an integer is congruent to 0 or 1 (mod 8), and the equation 


Pty +2+? = 3p 


implies that 

a? + y? + 2? + # = 0 (mod 8), 
while 

et+y+2+ i? £0 (mod 9). 
By the congruence, one of the quantities—say x—is divisible by 3, 
and either all the others are, or all are not, divisible by 3. Because of 
the incongruence, 3tyzt, so that y, z, and ¢ are all congruent to 
+1 (mod 3). Letz’ be that one of +z such that 2’ = y (mod 3), and 
let t’ be that one of +¢ such that t’ = y (mod3). Then 


x ytenty (-t2—*) (Aas) 
p=( 3 = 3 7 3 
tune) 
3 b 


where the quantities in parentheses are integers. The proof is 
complete. 


+ 


REFERENCES 


Section 7-5 


A brief proof of the theorem that every number not of the form 
48k -++ 7) can he represented as the sum of three squares is to be found 
in Landau, Handbuch der Lehre von der Verieitlung der Primzahlen, Leipzig: 
Teubner Gesellschaft, 1909, vol. 1, pp. 550-555. 


Section 7-6 


Theorem 7-10, and its application to Theorem 7-9, are due to A. Brauer 
and T. L. Reynolds, Canadian Journal of Mathematics 3, 367-373 (1951). 


CHAPTER 8 


PELL’S EQUATION AND SOME APPLICATIONS 


8-1 Introduction. The Diophantine equation 2? — dy? = N 
(where N and d are integers), commonly known as Pell’s equation, 
was actually never considered by Pell; it was because of a mistake 
on Euler’s part that his name has been attached to it. The early 
Greek and Indian mathematicians had considered special cases, but 
Fermat was the first to deal systematically with it. He said that he 
had shown, in the special case where N = 1 and d > 0 is not a per- 
fect square, that there are infinitely many integral solutions z, y; as 
usual, he did not give a proof. The first published proof was given by 
Lagrange, using the theory of continued fractions. Prior to this, 
Euler had shown that there are infinitely many solutions if there is one. 

Before beginning a systematic investigation, it might be worth 
while to indicate some of the ways in which the equation arises and 
some of the reasons, therefore, for its importance. On the one hand, 
knowledge of the solutions of Pell’s equation is essential in finding 
integral solutions of the general quadratic equation 


ax? + bry + cy? + dzx+ey+f=0, 


in which a, b, ..., f are integers. For, writing the left side as a 
polynomial in z, 


ax® + (by+ djxe+cy’+eyt+f=0, 
it is clear that, if the equation is solvable for a certain y, the 
discriminant 
(by + d)” — 4a(cy” + ey + f), 
or, what is the same thing, 
(b? — 4ac)y* + (2bd — 4ae)y + d? — 4af, 
must be a perfect square, say 2”. Putting 
b?—4ac=p, 2bd—4ae=q, dd? — 4af =r, 
we have 


rp +aytr-2=0. 
137 


138 PELL’S EQUATION AND SOME APPLICATIONS [cHaP. 8 


Again, the discriminant of this quadratic in y must be a perfect 
square, say 

g — 4p(r — 27) = wv”. 
Thus we are led to consider the Pell equation 

w? — Ape” = g° — Apr; 


knowing solutions of it, we can, at any rate, obtain rational solutions 
of the original equation. 


As a second example, consider the real quadratic field R (Vd), con- 
sisting of all the numbers of the form 


a + bv 4d, d > 1, d square-free 


where a and b are rational numbers—positive, negative or zero. Each 
of these numbers with 6 ~ 0 is a zero of a unique quadratic poly- 
nomial with relatively prime integral coefficients, that of x” being 
positive. If the leading coefficient of the polynomial is 1, the cor- 
responding number is said to be an integer of the field. (Notice the 
close analogy between the present discussion and that in the first 
portion of Section 7-3. There, of course, we were working with the 


integers of the nonreal quadratic fied R(W—1).) Starting with 
this notion of integer, it is possible to construct an arithmetic very 
similar to that developed in Chapter 2 for the ordinary, or rational, 
integers. Denote by R[~d] the set of all integers of R(Vd). If 
a, B, and a/8 are in R [Vd] , then we say that 6 divides a, and write 
Bla. If all, then ais a unit of R[Vd]. If every factorization a = By 
into the product of integers of R[-Vd] is such that either 8 or y is a 
unit, then a is prime in R[-Vd). Finally, the norm Na of a = a + bVd 
is the product of a and its algebraic conjugate a = a — b/d, namely 
a” — db*. It is a rational integer if a is an integer of the field, and 
Na-N6 = N(a8) always. 

Two complications now arise, however, which must be dealt with. 
The more serious, with which we shall not be concerned for the time 
being, is that the analog of the Unique Factorization Theorem does 
not hold for every d, and it is necessary to introduce a rather sophis- 
ticated mechanism to deal with this problem. The other complica- 
tion is that, in distinction to the set of rational integers where there 
are only the two units +1, a real quadratic field has infinitely many, 
as will follow from the theorems of this chapter. For it is easily seen 


8~2] THE cASE N = +1 139 
that a is a unit of R(V d) if and only if Na = +1, that is, if and only 
i a? — db? = +1. 


Since it can be shown that a + bVdisa quadratic integer if and only 
if a and b are both rational integers (or in case d = 1 (mod 4),a andb 
may also be halves of odd integers), the infinitude of units follows 
from Lagrange’s theorem concerning Pell’s equation. Clearly, knowl- 
edge of the structure of this set of units will depend on a thorough 
analysis of that equation for N = +1 and +4. 

We shall give a third application of Pell’s equation, this time to the 
minimum of an indefinite quadratic form, later in this chapter. There 
will be others in Volume II. 


PROBLEMS 


*1, Let d be greater than 1 and square-free, and let a be in R[V d]. Show 
that if d # 1 (mod 4), then 
a=at+ bV d, a, b integers, 
while if d = 1 (mod 4), then 
wut + bVd 


5 ’ a, b integers such that a = 6 (mod 2). 


[Hint: First show that if & is the conjugate of a in R (Va), then a + &@ and 
az, and therefore also 4aa — (a + %)?, must be in R[-Vd].] 

2. Show that a is a unit of R[-Wd] if and only if Na = +1. 

3. Find some solutions of the Diophantine equation 


z* + 6ry — 4y? — 4x — 12y — 19 = 0. 


8-2 The case N= +1. For the present we shall concern our- 
selves with the equation 


xz? — dy* = 1. (1) 
The case in which d is a negative integer is easily dealt with: if 
d = —1, then the only solutions are +1,0 and 0, +1, while if 
d < —1, the only solutions are +1,0. So from now on-we may 


restrict attention to equations of the form (1) withd >0. Ifdisa 
square, then (1) can be written as 


a =S (d’y)? = 1, 


and since the only two squares which differ by 1 are 0 and 1, the only 
solutions in this case are +1, 0. Suppose then that d is not a square. 


140 PELL’S EQUATION AND SOME APPLICATIONS [cHapP. 8 
THEOREM 8-1. For any irrational number £, the inequality 
1 


lz — &y| o. (2) 


has infinitely many solutions. 


Proof: According to Theorem 7-1, if £ is irrational the mequalities 
1 
0<|e—- fl <-> l<y Xt, (3) 


have a solution for each positive integer ¢. It is clear that each solu- 
tion of (8) is also a solution of (2). Taking ¢ = 1 in (8) gives a 
solution 21, y; of (2). Then for suitable é, > 1, 


1 
jz, — &y,| are 
1 
and taking ¢ = ¢, in (8) gives a solution 22, ye of (2). Since 


Iz2 — Eyo| < lai — Syl, 


the two solutions so far found are distinct. Now choose ft. > 4 so 
that 


1 
jaz — Eye| > Py 
2 
and for ¢ = é, find z3, y3. Clearly this procedure can be continued 
indefinitely, yielding infinitely many solutions of (2). 
THEOREM 8-2. There are infinitely many solutions of the equation 
gz? —dy*=k (4) 
in positive integers x, y for some k with |k| < 1+ 2V4d. 
Proof: If x, y is a solution of (2), then 


lx + yVd| = |x — yd + 2yVd| <7 +%Va < (1 + 2V/d)y, 
and so |x? — dy?| < : (1+ 2Vd)y = 1+ 2V4. 


Since there are infinitely many distinct pairs 2, y available, but only 


finitely many integers numerically smaller than 1 + 2V4, infinitely 
many of the numbers zx” — dy” must have a common value, which is 
the theorem. 


8-2] THE CASE N = +1 14] 
THEOREM 8-3. Equation (1) has at least one solution with y # 0. 


Proof: Separate the infinitely many solutions of (4) into k? classes, 
putting two solutions x1, y; and Ze, ye in the same class if and only if 
71 = 22 (mod k) and y; = ye (mod k). Then some class contains at 
least two different solutions, say 21, y; aNd Xe, Ye, With 41272 > 0. Put 


_ @1h2 — dyrye _ 21Y2 — T2Y1 . 
ee 

we shall show that x and y are integers with y0 for which x? — dy? = 1. 
It follows immediately from the congruences 


wt, = 22 (mod k), Y¥1 = Ye (mod k), 
that 
t1Y2 = Ley; (mod k), 


and so y is an integer. Also, from these congruences and from (4), 
t1%2 — dyiy2 = x1" — dy,” = k = 0 (mod k), 


and so x is an integer. Furthermore 
a? — dy* = : ((aite — dyrye)? — d(xiye — t2yi)*) 
= s (217297 — day?yo” + d?yy7yo” — dre7y1") 
7 3 (a1 — dyy*) (x9? — dye”) = 1. 


Finally, if y = 0, then 
L1Yo = L241, 


so that for some @, 7; = axzg and y; = ayo. But since x, y; and Xo, Ye 
are both solutions of (4), it must be that a = 1, contrary to the as- 
sumption that 21, y; and 2o, ye are different solutions. 


THEOREM 8-4. If 21, y; and 22, yo are solutions of the Pell equa- 
tion (1), then so also are the integers x, y defined by the equation 


(1 + yr V4) (a2 + yoVd) = 2 + Vi. (5) 
Proof: It follows from (5) that also 
(11 — yiV 4d) (t2 — y2Vd) Se yvd, 


142 PELL’S EQUATION AND SOME APPLICATIONS [cuar. 8 
and multiplying corresponding sides of these equations gives 
2? — dy? = (41? — dyy”) (xe? — dya”) = 1. 


In particular, it follows from Theorems 8-3 and 8-4, taking 
XL, = Le, Yi = Yo, that the numbers z, y defined by 
(1 tyiVd)* =x2+yVd 


form a solution for every positive value of n; that this is also true for 
negative values of n follows from the fact that 


1 
= = 4, — Vd. 
a t+yvd 
We shall now show that a general solution can be obtained in this 


fashion. For brevity, we shall refer to xz + yvd, as well as x, y, as & 
solution of equation (1). It will be called positive if x > 0 and 
y>0. The positive solutions will be ordered by the size of x, or 


what is the same thing, by the size of x + yWV 4d, since if 21 + y:V d 
and ro + Yo Vd are positive solutions, and x1 > 22, then 


ty + yiVd > to + yoWVd, 


and conversely. 


THEOREM 8-5. If x1, yi is the minimal positive solution of equa- 
tion (1), then a general solution is given by the equation 


atyvd = +(x, + y,V ad)", (6) 
where n can assume any integral value, positive, negative or zero. 


Remark: Because of this theorem, the minimal positive solution 
of (1) is sometimes called the fundamental solution. 


Proof: That (6) actually furnishes a solution for each n > 0, we 
have just seen. Since 


(2 + yVdy = (x — yVd)”, 


(6) also gives a solution for each n < 0. Since the solutions with 
y = 0 correspond to n = 0, it suffices to show that (6) gives every 
solution with y ~ 0. Furthermore, if 11, y;, and n are positive and 


2+yVd = (x + y,Vd)" om 


8-2] THE CASE N = +1 143 
then —¢ + (-y)Vd = —(1 + yiVd)" <1, 
z+ (—y) Vd = (a +yVd)™ <1, 
—¢+yVd = —(4 +yVd) <1, 
so that it suffices to show that every solution of (1) with both z and y 
positive (so that 7 + yd > 1) satisfies the equation 
rt+yVd=(a+mVd)", n>0. 

Put Ea + yy Vd = a; then if z, y is any positive solution of (1), 
z+ yVd > a, since wis minimal. Hence there is ann > 0 such that 
a <SatyVd < ott, 

But then 
L< @t+yVd)i™ = @ + yV4)(q1 — Vd)" < a, 
and this, by Theorem 8-4, contradicts the minimality of a unless 


(2 + yVd) (11 — y:Vd)” = 1, 
whence 
atyVd = (4+ yi Vd)”. 

Turning now to the case N = —1, we find a somewhat similar 
situation, with the essential difference that the equation is not always 
solvable. This is the case, for example, when d = 3, for the expres- 
sion z* — 3y” assumes only the values 0, 1, and 2 (mod 4). However, 


it is again true that all solutions can be expressed in terms of a single 
one, when such exists. 


THEOREM 8-6. Let d be a positive nonsquare integer. Then if the 
equation 


z* — d? = -1 (7) 


is solvable, and if 2; + t:Vd is the minimal positive solution, a 
general solution 1s given by 


2+iVd = £(a + tVa)2"41 n=0, +1,.... 
With the earlier notation, 
a= 24+ yiWd — (Z1 ~- t, Vd)? 


Proof: We prove the second assertion first. It is clear that 


144 PELL’S EQUATION AND SOME APPLICATIONS [caap. 8 
(2; + t,/d)? is a solution of (1), so that 

L<ntyvd < (2 +4Vva). (8) 
This gives 
—2 +tVd < —2a; + dyity + (—aiyy + tha,)Vd < 24+ ty Vd, 


where the number in the center of this inequality (which we will call 


2+ tv4, for the moment) is again a solution of (7), so that in 
particular ¢ ~ 0. But if a number lies between the minimal positive 
solution of (7) and its reciprocal, the same is true of the reciprocal of 
that number, so that either 


l<ztivd<2+tVd 


or 


l<—-2+tVd<a+tVa. 
Using the minimality of z; + 4 Vd, we conclude that 
e+iVd=24+tVd. 


Now suppose that z + iV d is any solution of (7), where we can 
again restrict attention to the case z,t > 0. Then as in the proof of 
Theorem 8-5, we can find an n such that 


l<@+ivdja® <a = (4 +4V4)?, 
or, dividing through by z; + Vd, 
—2a+4Vd<a2' +y'Vd<xy + Vd, 
where x’ + y’ Vd satisfies (1). This inequality implies that 
at<27’ +y'Vd <a, 
so that x’ + y’ Vd = 1, and 
z+tvd= (4) + t,Vd)o” = (4) + ty Vd)2"41, 


PROBLEMS 


1. Find a general solution of the equation xz? — 2y? = 1. 
2. Describe all the integral solutions of the equation 


x? + 6ry + 7y? + 82 4+ 24y4+ 15 = CO. 


8-3] THE CASE |N| > 1 145 
3. Show that 
“lim. inf (n(nV2 — [nvV2])) = 7 


Tr © 
[The assertion means simply that if a, stands for the quantity in paren- 
theses, and if e > 0, then 
l1+e 
2/2 


An < 


for infinitely many n, while 


for all sufficiently large n.] 
4. Show that a necessary condition that the equation x? — dy? = —1 
be solvable is that d have a proper representation as a sum of two squares. 


8-3 The case |N| > 1. Because of its special interest in connec- 
tion with the units of real quadratic fields, we consider separately the 
case |N| = 4. 


THEOREM 8-7. Let d be positive and not a square. If ry + s1 Vd is 
the minimal positive solution of the equation 


r? — ds* = 4, (9) 
then a general solution zs given by 
Vv d\" 
rt eva a2(Btavd ’ n=0, +1,.... (10) 
If the equation 
r? — ds’? = —4 (11) 


is solvable, and its minimal positive solution is r,! + s;'Vd, then a 
general solution 1s given by 


? n=0,+1..... 


V/ad 2n+1 
2 


/ 7 
yr! 4+. 3/Vd aoe sa(™ as 


Proof: Clearly, the double of any solution of (1) is a solution of (9). 
While this remark shows that (9) is always solvable, not all the 
solutions can necessarily be found in this way, since, for example, 
37 — 5-1? = 4, and 3 and 1 are odd. 


146 PELL’S EQUATION AND SOME APPLICATIONS [cnap. 8 
If ro + soWd and r3 + s3Vd are any solutions of (9), then 


r+sVd= p22 — poe = 


is another integral solution. For, from (9), 7° = ds;* (mod 2), so 
that r; = ds; (mod 2). Hence 

2r = rorg + dsos3 = d?sosg + dsoss = d(d + 1)ses3 = 0 (mod 2) 
and 

28 = 1283 + 1382 = dsos3-+ dsos3 = 2dsos3 = 0 (mod 2), 
so that r and s are integers. Also, 


2 2 
(r+ sVa)(r — eV) = 7? dg = 4 Oe To Os _ 
It follows that the numbers r + sd defined in (10) are solutions 
of (9), for every n. The remainder of the proof for the case N = 4 is 
an easy modification of the proof of Theorem 8-5. The proof for 
N = —4isastraightforward combination of the above considerations 

and the proof of Theorem 8-6. 


For general N, the situation is rather complicated. The following 
theorem gives a partial result. 


THEOREM 8-8. If d > 0 7s not a square, and if the Pell equation 
uw? — dy?» = N (12) 


has one solution, it has infinitely many. In particular, of x1, y1 1s a 
solution of equation (1) and uj, v; 2s a solution of equation (12), then 
the integers u, v determined by 


utovVd = (t+ Vd)(m + V4), (13) 
form a solution of equation (12). 


Proof: The second statement is proved in exactly the same way 
as was Theorem 8-4. The first statement follows immediately from 
the second, making use of Theorem 8-5. 

Notice that it may not be possible to obtain all solutions of (12) 
from one solution and the set of all solutions of (1). For example, 
the equation u2 — 2x? = 49 has the solutions 7 and 9 + 4V/2, and 
neither can be obtained from the other by multiplying by a solution 
of x? — 2y* = 1. 


8-3] THE CASE |N| > 1 147 


Theorem 8-8 can be used to obtain a finite criterion for the solva- 
bility of (12). If two solutions of (12) are related as in (13), we say 
that they belong to the same class. We now find bounds on the 
smallest element of each class, where the solutions are ordered by the 


size of u. (We can require that u be positive, since u + vVd and 


—u — vV4d are in the same class.) The investigation is carried out 
only for N > 0; the case N < 0 is similar. 


To do this we ask, given a solution u, + v,Vd of (12), with 


u; > 0, when is it possible to find a smaller solution u + »Vd, with 
u > 0, in the same class? That is, we want to find wu and v such that 


utovd = (2 + yVd) (uy, + Vad), 0<u< w, 
x? — dy” = 1. 
Let a = 24 + yiWd be the minimal positive solution of (1). If 
1, > 0, take z + yVd = a! = x, — y, Vd; while if », <0, take 
ttyv d = a; in either case, we get 


U = Wt, — yrlri|d = wy (x = yiVvd lve) 


= Uy fe — Vat vat = i -*)} 


Here 0 < N/u,” <1. Since 
0<1-V1l-t= 
for0 < ¢ < 1, we have 
O<ucy (= + 


t 


t 
San 
l+vil-t 2-7 
yiWd N 
Quy? —N 


A little manipulation shows that the coefficient of uw; is smaller than 1, 
so that wu < wy, if 


u, > J N, where 6 = ° 


ByiVd + a 
2 a—l 


Since yi Vd = V2," — 1 < 2, we have proved 
THEOREM 8-9. If equation (12) is solvable, it has a solution with 


0<u< ft? wy, (14) 


148 PELL’S EQUATION AND SOME APPLICATIONS [cHaP. 8 


where a = x1 + yiVd as the minimal positive solution of equation (1) 
and B = a/(a—1). If there are two or more classes of solutions of 
equation (12), each contains an element for which equation (14) holds. 


This reduces the question of the solvability of (12) to a finite 
problem, once the minimal positive solution of (1) is known; it 
suffices to decide whether any of the numbers (u* — N)/d is a square, 
for u in the interval (14). If there are two or more such values of u, 
it is a simple matter to decide whether the corresponding solutions 
are in the same class. 

For example, when d = 2 we have the minimal positive solution 
3? — 2-2? = 1, and it is easily seen that (14) holdsif0 <u < 3VN. 
Since also N = u* — 2v? < u’, we need only examine the integers u 
between VN and 3VN, for each N. 


PROBLEMS 


1. Complete the proof of Theorem 8-7. 

2. The statement obtained from Theorem 8-7 by replacing 2 and 4 by 
7 and 49, respectively, is false, as is seen by considering the numerical 
example immediately following Theorem 8-8. Where would the analogous 
proof break down? 

3. Show that if N < 0, Theorem 8-9 remains correct if the inequality 


(14) is replaced by 
ax,|N| 
0 Sanitarian aidan 
<u<, re +1 


[Hint: Prove and use the fact that fort > 0,V1+#-—1 < t/2.] 


4. Describe all the units of R(V/2); of R(W5). Cf. Problem 1, Section 
8-1. 


8-4 An application. We showed in Theorem 8-1 that if é is irra- 
tional, the inequality 
| | ee 
y| y¥° 
has infinitely many solutions in integers x, y. It is the object of the 
present section to make a more detailed examination of the approxi- 
mability of a quadratic irrationality (that is, an irrational root of a 
quadratic equation with integral coefficients) by rationals, making 
use of the preceding results concerning Pell’s equation. 


Gi 


84] AN APPLICATION 149 


It is easy to see that when é is a quadratic irrationality, there is a 
constant go = go(é) such that the inequality 


x 1 
rales 
7] gy 
does not hold for any x, yif g > go. For if £ is defined by the equation 
f(é) =a# +bE+c=0, a,b, c integers, 
and if f(x) factors as 
f(x) = a(x — £)@ — £), 
then 
e-2 _ lox* + bay + cy"| S 1 
y| sy lal-e’ — 2/yl ~ y?lal-lé’ — 2/y| 
since ax” + bry + cy” is an integer different from zero. Since é is 
irrational, ¢ + &’. Hence from the above inequality, either 
[ee 4| | 1 
> —=| > —-—______ ; 
2 5 y| = 3ylahle — £9721 


ad 
y 
and we can take 


—_ mf 2 ee 
a(@) = min (25, ae) 


[= 


for any e > 0. 
We are thus led to consider the quantity M(£), which is the upper 

limit of those numbers \ for which the inequality 

x 1 
aa Ba tea 

y| dy? 
has infinitely many solutions. It was first treated by A. Markov, who 
made an extensive investigation of M(&) in connection with the 
problem of determining an upper bound for the minimum value 
assumed by an zndefinite quadratic form, i.e., an expression 


Az* + Bry + Cy’ 


in which D = B? — 4AC > 0, Dis not a square, and 2, y are integral 
variables. Markov made use of the theory of continued fractions, 
but we shall derive certain of his results using only the theorems just 
proved concerning Pell’s equation. 


150 PELL’S EQUATION AND SOME APPLICATIONS [cHaP. 8 


In order to avoid interrupting the argument later, we first prove a 
lemma. 


THEOREM 8-10. Lei 
f(x, y) = ax? + bry + cy” 
have integral coefficients such that d = b® — 4ac > 0 and d is not a 
square. Then if the equation 
f@,y) =k. (15) 

has one solution in integers, it has infinitely many, and each of the 
two quantitres 

lax + by + yVd| and \2ax + by — yV d| (16) 


is less than any prescribed positive number for infinitely many such 
solutions. 


Proof: (a) In the case k =a, let X, Y be integers such that 
X? — dY* = 1, so that 
4a?X* — 4a7dY? = 4a. 
If we put 
2aX = 2ax + by, 2aY = y, 
in which case x and y, given by 
x= X — OY, y = 2aY, (17) 
are integers, then 
(2ax + by)” — dy? = 4af(x, y) = 4a’, 
or 
f(x,y) =a. (18) 


Since by Theorem 8-5 there are infinitely many pairs X, Y, there are 
infinitely many solutions of (18). Since 


2 2 
lim (==) = i ine 0 


y yoo Y 


it is clear that one of the quantities in (16) is smaller than any pre- 
scribed e« > 0 for sufficiently large y. Moreover, if the integers z, y 
determined by X, Y in (17) are such that one of the quantities in (16) 
is small, then the numbers x’, y’ determined by X, —Y are such that 
the other quantity is small. 


8-4] AN APPLICATION 151 


(b) In the case k ¥ a, let 21, y; be a solution of (15), and xe, yo a 
solution of (18). Then the integers z, y defined by the equation 


(az, + byr + y1 Vd) (2axe + bye — y2V ad) 
2a 
again satisfy (15). Since there are infinitely many solutions of (18), 


the same is true of (15). Furthermore, for fixed 2, y;, the first 
quantity in (16) will be small if zo, y. ranges over those solutions of 


(18) for which |2ar_ + by2 — yeWd| is small, while the second will 
be small for the remaining solutions of (18). 


2ax + by + yVd = 


THEOREM 8-11. Let & be a real quadratic irrationality of discrimi- 
nant d: 


a’+bé+c=0, d=b?—4ac>0, d nota square, 
(a, b,c) = 1, a> 0. 
Then if k is the smallest positive integer for which the equation 
lax* + bay + cy"| =k 
has an integral solution, 


M(é) = 


Proof: (a) M(é) must be less than or equal to V d/ k. For 
assume on the contrary that 


va 
as 


m@ - V4, 
(1 — 5)k 
where 0 <6 <1. Then the inequality 
we—b 2 (lL — 8)k 
2a y Vdy* 


holds for an infinite sequence S of distinct fractions z/y. (The case 
that ¢ = (—b — Vd)/2a is treated similarly.) Then, multiplying 
through by Vd, we have 
b d (1 — d)k 
Pee le Oe 
( 7 dal 
2a(1 — 5)k 


’ va - |< e. 


152 PELL’S EQUATION AND SOME APPLICATIONS [cHap. 8 


Hence 

2a(1—d)k| - dy d?y? 

sd cd pe see ci te 
y|2ax + by| Mee 2ax + | ‘ | (2az + by)? 


4ad ‘ ‘ 4akd 
ax + bye ®t bt l= Oo ay? 


and, therefore, 
2dy | 
> ee 
Zax + byl -|Wd + dy/(2ax + by)| 
Ee ee ee eee 
lb + 2ax/y| -|Wd + dy/(2az + bdy)| 


But as z/y runs through the sequence S, y increases without limit and 


_ 


(19) 


b+ 2a Vd, 


dy — 
Vd + ee by ev 


so that 
2d 


hy SS Se 
woe,lb + 2ax/yl -|Vd + dy/(2ax + by)! 
which contradicts (19). 


(b) M(Eé) must be greater than or equal to Vd/ k. For from the 
definition of k, and Theorem 8-10, we have that the equation 


|(2ax + by)? — dy?| = 4ak 
has infinitely many integral solutions z, y. The left side factors into 
\2ax + by — y Vdl - |2ax + by + y Vad| = 4ak. (20) 


By Theorem 8-10, each of these factors is small for infinitely many 
pairs x, y. Henceforth we restrict x, y to the set 7 of solutions of (20) 
for which the first factor is smaller than the second. (The proof in 
the alternate case proceeds similarly.) Then 


x —b+wvd 
A pel a 
y 2a 


as |y| tends to infinity. Furthermore, 


8-5] THE MINIMA OF INDEFINITE QUADRATIC FORMS 153 


ez —b+V4d|_ 4ak 
y 2a 4a?|2/y + (b+ Vd)/2aly? 
k 


~ alz/y + (b + Va) /2aly? 


zt b+vd vd 
and -+ —>— 
y 2a a 
as |y| tends to infinity. Hence, given e > 0, the inequality 
z —b+vVdl_ k(1+e) 
Y 2a y Vad 


holds for all (x, y) < T with |y] > yo(e). Hence M(t) > Vd/k. 
The proof is now complete, since if M(t) > Vd/k and also 
M(t) < V4d/k, it must be true that M(t) = V/d/k. 
Corouuary. If & ts defined as in Theorem 8-11, then 
Vad 
a 
For clearly k > 1, and k <asincea-17+b-1-0+c¢-0? =a. 


< M(t) < Va. 


PROBLEM 
Generalize the result of Problem 3, Section 8-2, evaluating 


lim inf n(nV'm — [nVm]), 


nra— © 


where m is a positive square-free integer. 


8-5 The minima of indefinite quadratic forms. So far we have 
used Theorem 8-11 to obtain information concerning the quantity 
M(é); it can also be used, in conjunction with the following well- 
known theorem of A. Hurwitz, to obtain information about the 
numerically smallest value assumed by an indefinite quadratic form. 


THEOREM 8-12 (Hurwitz’ theorem). If § is any wrrational number, 

then there are infinitely many integral solutions x, y of the inequality 
° 

V5 y? 

Consequently, M(é) = V5 for every irrational &. 


< 


i] 


154 PELL’S EQUATION AND SOME APPLICATIONS [cHap. 8 


We defer the proof for a moment. Assuming the theorem to be 
correct, a comparison of it and Theorem 8-11 yields the following 
result. 


THEOREM 8-13. If f(x, y) ts an indefinite binary quadratic form of 
nonsquare discriminant d, then 


Y 
0<If(,y)| < — 


for suttable integers x, y. 


The coefficient 1/ V5 occurring here is best: possible, in the sense 


that the theorem becomes false (for some quadratic forms) if 1/ V5 
is replaced by a smaller constant. For the form k(x? + zy — y”) has 
discriminant 5k”, and it is clear that this form assumes no nonzero 
value numerically smaller than [k(1? + 1-0 — 07)| = |k|. 


8-6 Farey sequences, and a proof of Hurwitz’ theorem. A very 
simple proof of Hurwitz’ theorem can be deduced from the well- 
known properties of the so-called Farey sequences F,, which are the 
sequences of rational numbers a/b with 0<6b<n, (a,b) =1, 
arranged in increasing order of magnitude. Thus for the first few 
values of n we have 


—-1 01 2 
Fy: eae ie ae ce is 
-1 011 3 
Fs: 9 77’ 9’ 79’ 
Fs: ee ee 
8 1323 1 3 
F'4: ye pe 
414383823414 
eS. ceed ee ee 
"5 154352534515 


Clearly the number of elements of F,, which lie between 0 and 1 inclu- 
sive is 1 + g(1) + 9(2) +--+ + ¢(n). 


8-6] FAREY SEQUENCES 155 


The rational numbers p/g and r/s are said to be adjacent in F,, if 
they are successive elements of F,,. 


THEOREM 8-14. (a) If p/q and r/s are adjacent in F,, then 
lps — gr| = 1. 
(b) If |ps — gqr| = 1, then p/g and r/s are adjacent in F, for 
max (q¢,s) <n<qt+s, 
and they are separated by the single element (p + r)/(q + 8) in Fqis. 


Remark: This theorem on the one hand gives necessary and sufh- 
cient conditions that p/q and r/s be adjacent in F,,, and on the other 
hand gives the law of formation of the new elements that appear in 


going from F,, to Fni;. The number (p + r)/(q + 8) is called the 
mediant of p/g and r/s. 


Proof: Suppose that p/q and r/s are elements of F, such that 
gr — ps = 1, so that r/s > p/g. Ast varies continuously from zero 
to infinity, the number 
pt+itr 
q+ ts 
increases steadily from p/q to r/s, so that there is a one-to-one cor- 


respondence between the positive real numbers ¢ and the points of 
the interval 


fi) = 


r 
aren (21) 

qg s 
Moreover, it is clear that f(¢) is rational if and only if ¢ is rational; 
since we are interested only in the rational numbers in the interval, 


we put t = u/v, where (u,v) = landu>0O,v>0. This gives 
; (*) _ up + ur 
v vg + us 
qup + ur) — p(vg + us) = u(gr — ps) = 4, 
sop + ur) — r(vqg + us) = v(ps — gr) = —2, 
we have (vp + ur, vg + us) = 1. Thus we have shown that as u and 
v run over all pairs of relatively prime positive integers, the reduced 


fraction (vp + ur)/(vq + us) runs over all rational numbers between 
p/q and r/s. 


Since 


156 PELL’S EQUATION AND SOME APPLICATIONS [cHapP. 8 


Among these fractions, the one with u = v = 1 is clearly the unique 
one of smallest denominator; it is the mediant of p/g and r/s, and 


lp+rig—(qt+s)pl=1, Irqts)—sptr)| =1. 


Since g + s > max (g,s), part (b) of the theorem follows. To 
prove (a), we proceed inductively. fF, consists of the integers 
...,—-1/1,0/1,1/1,...,and|a-1— (a+ 1)-1| = 1,s0 that (a) is 
true forn = 1. If itis true for n = m, it is also true forn = m +1, 
since the only elements of F,,.., not in F,, are certain mediants of 
adjacent elements of F,,. The assertion follows by the induction 
principle. 


Proof of Hurwitz’ theorem: If a/b is a reduced fraction and c is a 
positive real number, designate by J,(a/b) the closed interval 


|; spetsal 
b cb?’ bch? 
Hurwitz’ theorem says that if £ is irrational, there are infinitely many 


fractions x/y such that &€ Iyg(a/y). 
For each n, & lies between some two adjacent elements of F,, say 


Doe 
q s 
We divide the interval [p/q, r/s] into left and right halves: 


[p nae) Be 4 
J, =|-»—]}> Jr= »-|- 
u E q+s " q+ts s 


We now ask, how large may c be if it is required that the three inter- 
vals I.(p/q), Ie((p + 7)/(q + s)), Ic(r/s) together completely cover 
the interval Jz? If this is the case, and é€ Jz, then & must be an 
interior point of one of these intervals [,, and we have a solution of 
the inequality | — z/y| < 1/cy’. 
Clearly I,(p/q) and I,(r/s) overlap (or abut) if and only if 

p 1 ro 

q cfs cs 
and this reduces, with the aid of the relation rg — ps = 1, to 


1 1\_si,q, 
e<a(+a)-it!; 


8-6] FAREY SEQUENCES 157 


or, putting f(z) = x + 1/z, to 


1) 


Similarly, I.(p/q) and I,((p + r)/(qg + s)) overlap if and only if 


si(E)as(049) 
q q 


and so Jz is certainly covered by the intervals I, if 


c < max (f(s/q), f(1 + s/q)), 


and a fortiorz if 
¢ < min {max (f(e), f(1 + 2))} 
z> 


But a glance at the curves y = 
f(z) and y = f(1 + z) shows that 
the curve y = max (f(z), f(1+2)) 
is concave upward for x > 0, and 
has its minimum ¢p at x = 2, 
where f(z) = f(1 + 20). (See 
Fig. 8-1.) A simple calculation 
gives 


Co- 


y= f(l+2) 


Zo 1 


XX = nd Sos ) Co = V5. FIGURE 8-1 


The proof can now be completed in either of two ways. The simpler 
is to note that £, being irrational, must lie, for infinitely many 7, in the 
left half Jz of the interval between its surrounding Farey points; 
for if not, it would have to lie in all the intervals 


feat bee’ deeaae 
q’s q+ts’s q+2s s e% 


and the only point common to all these intervals is r/s itself. And 


whenever ~§< Jz, the above argument shows that at least one of the 
numbers 


P ptr r 


— 9 


q q+s $s 


affords a solution of Hurwitz’ inequality. Finally, this gives infinitely 


158 PELL’S EQUATION AND SOME APPLICATIONS [cHaP. 8 


many solutions, because é lies in infinitely many Jz’s and infinitely 
many JpR’s, so that only finitely many of these intervals have a 
common end point. 

Alternatively, one can also examine the conditions under which the 
intervals I,(p/q), I-((p + r)/(q + s)), and I,(r/s) completely cover 
the interval Jp; an argument similar to that given above shows that 
this is the case if 


+i orr ($3) = 


It is then not necessary to distinguish the cases §€ Jz, §€ Jr. 

Again using the fact that é lies in infinitely many left half-intervals 
and infinitely many right half-intervals, we deduce the following 
stronger form of Hurwitz’ theorem. 


THEOREM 8-15. If £ is irrational there are infinitely many solutions 
of the inequality 


1 
<—— » 
V5 y? 


If, for arbitrary n, & lies between the adjacent elements p/q and r/s of 
F,, then at least one of the three numbers p/q, (p + r)/(q + 8), r/s 
as a solution of the inequality (22). 


2 


(22) 
y 


CHAPTER 9 


RATIONAL APPROXIMATIONS TO REAL NUMBERS 


9-1 Introduction. In the investigation of the solvability of the 
equation n = x” + y” in Chapter 7, it was convenient to use the fact 


that if x is real and ¢ is a positive integer, there are integers p and q 
such that 


1 
L—-pi <-> l<q<t. 
lg rere <q< 
In connection with Pell’s equation we used an easy consequence of 
this theorem, that if z is irrational, the inequality 


1 
lax — p]l <- 
q 


has infinitely many integral solutions g and p with g > 0. Finally, 
the investigation, in Chapter 8, of the numerically smallest nonzero 
value assumed by an indefinite binary quadratic form hinged on 
Hurwitz’ theorem, which states that the inequality 


Ca ase 


has infinitely many integral solutions q and p with q> 0, if x 1s 
irrational. These theorems, while quantitatively different, all tell 
something about how small the absolute value of the linear form 
qx — pcan be made if the integers g and p are not both zero. Several 
generalizations of this problem come to mind at once, involving either 
a larger number of variables, or more than one such form, or both. 
The investigation of the behavior of such sets of forms is a central 
problem in the theory of Diophantine approximations; while many 
results have been obtained, few of them have the quantitative pre- 
cision of Hurwitz’ theorem, which becomes false if V5 is replaced by 
any larger constant. (This statement has not yet been proved; it is 
a consequence of Theorem 9-9.) One reason for this is that it is only 
in the simple case of one linear form in two variables, that a simple 
159 


160 RATIONAL APPROXIMATIONS TO REAL NUMBERS _[cuap. 9 


algorithm can be constructed which yields all the pairs p, qg for which 
lqx — p| is “small,” in a sense which will be made explicit below. 
Naturally, it is much easier to investigate the small values of a func- 
tion if one knows what values to use for the arguments. One of the 
objects of this chapter is to develop this algorithm. 

Rewriting Hurwitz’ inequality in the form 


2 poe 
<p 


q 
we see that we are here concerned with a notion of “good”’ rational 
approximations to an irrational number which differs essentially from 
that generally understood in analysis. There, we say that p/q is a 
better approximation to z than is 7/s if 


r 
ZL——--le 
s 


< 


P 
prepare 
q 


The question of finding this kind of good approximation is rather 
uninteresting arithmetically, although of course it may be necessary 
to use approximate values of irrational numbers in arithmetic in- 
vestigations. What is involved in the theorems we are now discussing 
is 4 comparison of the exactness of the approximation with the size 
of the denominator of the fraction used, the comparison being effected 
by taking the product 


z— =|. 
q 
At least if x is irrational, the first factor in this product gets large as 
the second approaches zero; for out of all the elements of an arbitrary 
Farey sequence Fy there is one which is nearest to z, so to find a 
nearer rational number it is necessary to consider fractions whose 
denominators exceed N. To require the above product to be small is 
therefore a much stronger condition than that imposed in analysis. 
Instead of searching immediately for appropriate fractions corre- 
sponding to a given z, it is fruitful (and indeed necessary, to make 
precise the meaning of ‘‘appropriate’’) to make x, instead of p/q, the 
unknown quantity. That is, we fix a rational number p/gq, and ask 
what numbers should be considered as having p/q as a good approxi- 
mation. Put so crudely, the question is unanswerable; we must 
decide what other rational numbers are competing with p/q. It 


q 


9-1] INTRODUCTION 161 


seems natural to consider just the elements of some Farey sequence 
Fy which contains p/g, and to say that p/g is a “good” approxima- 
tion to x if, for some N > q, |qx — p| < |sx — 7r| for all r/s in Fy. 
This is perhaps most easily thought of this way: we measure distance 
from p/g in Fy, not by the usual expression |x — p/g|, but by 
qgjz — p/q|, so that there is an individual measuring rod (or, more 
briefly, a metric) associated with each element of Fy. It is clear that 
“distances” increase more rapidly (in comparison with ordinary 
length) when measured from an element of Fy with large denominator 
than from an element with small denominator. We now associate 
with p/q all those points x such that the “distance” |gx — p| from 
p/q is less than or equal to the “‘distance’”’ |sz — r| from an arbitrary 
element r/s of Fy. Call this set of points Ry(p,q); formally, 
Ryn (p, q) is the set of x such that 
min (|s¢ —7|) = [gx — pl. 
r/sC Fy 

Clearly, p/q itself is in Ry (p, q). 

Each of these sets Rn (p, q) consists of a single interval. 'To see this, 
we first prove that if p/g¢ and r/s are adjacent in Fy, then no point x 
between them belongs to any Ry(i, u), if ¢/u is neither p/q nor r/s. 
This is obvious if x is either p/q or r/s. Suppose that 

: < e << ; 

u @q s 
the other possible order, in which ¢/u > r/s, is treated similarly. If 
q <u, then 


t 
0<gr—p=a(z-2) <a/( -£\ <ul -‘\a ur 
qg U u 


so if the assertion is false, it must be that g > u. But then if 
qx —p > ur — fF, 


so that 
Py came 
Sou 
we have 
fee eee ee ee ee 2 
q—u q—u 


since gr — sp = lwhileur — st > 1. This contradiction shows that 


162 RATIONAL APPROXIMATIONS TO REAL NUMBERS [cHapP. 9 


Ry(p, q) does not extend past the two elements to which p/q is 
adjacent in Fy. But the condition 


lqz — p| < |sx — 7 


gives gx -p<r-— Sf, 
or pee 
qts 


so that Ry (p, g) consists of all x between the two points which are the 
mediants of p/g and its immediate neighbors in Fy. 

In particular, it follows that the new points which appear in going 
from Fy to Fy+ always appear at end points of intervals Ry. 

We now adopt the following convention: if for some N, the number 
z is a point of Ry(p, q) then p/q will be called a best approximation 
tox. For this N, !gx — p| is less than or equal to any expression 
lsc — rl with s << N; a fortiort, |x — p/gq| is less than or equal to 
lx — r/s| if s < q, so that if p/q is a best approximation to z in our 
present sense, it is also the rational number closest to x (in the ordi- 
nary sense) out of all those with denominators not exceeding q. 

There is thus associated with a fixed + a unique sequence of best 
approximations; the sequence will be infinite unless z is rational, in 
which case x lies inside its own interval Ry, for N greater than or 
equal to the denominator of x. (For rational z, the sequence is not 
quite unique, since z is a common end point of two intervals Ry for 
some NV.) If N > q, Rn4i(p, qg) is contained in Ry(p, g), so that if 
p/q is a best approximation to x, then certainly x is in Kg (p,q). Tih 
is the largest non-negative integer for which z7< Rgin(p, q), then 
zt<Ry(p,q)forg< N<q+h. Thus, if for fixed x we define ay/by 
for N = 1,2,... as that rational number such that x ¢ Ry (ay, by), 
then for suitable No, Ni,... we have 1 = Ng < Ny < No <--- and 


a, a anit, ON , 
byo 01 be byi1 =m 
Qni _ ONit1 _ || _ GN2-1 ©, ON2 
bn; bn 341 ONno—-1 bne 
QN2 _ 4Ne+1 Qn3-1 _, 4N3 


9-1] INTRODUCTION 163 
Since ay,_;/bn,_, and ay;,/bn, are adjacent elements of F'v;,, we have 
lon.Qnzp1 — bn, 1ON; = 1, RS Lo 2eewes (1) 


Now consider the following problem: given a real number 2, to find 
a systematic method for determining the sequence of best approximations 
tox. We begin by reducing z by its greatest integer [x] = Xo; the 
new number 2’ = x — [z] is then in the interval (0,1). Put Po = o, 
Qo = 1, so that Po/Qo is or is not a/b; according as the fractional 
part x’ of z is less than or greater than 3. (In what follows, we shall 
assume that if x is rational, its denominator is sufficiently large that 
equality does not occur in statements such as the preceding one. This 
point will be considered in detail later.) If Po/Qo = a1/b1, we put 
Px /Qe = On;,/bn;, While if Po/Qo ¥ a1/b; we put P;,/Q: = An,-1/bny-1, 
for k=1,2,.... Thus the sequence {P;/Qz} coincides with 
{ay,/bn,}, except that Po/Q) may not be a best approximation. If 
we also put P_,; = 1, Q_, = 0, then 


QOoP_1 — Q-1Po = 1. (2) 


The numbers P,/Q;, P2/Qe, . . . are now to be determined. It turns 
out that this can be done using an algorithm, closely related to the 
Euclidean algorithm, of considerable importance in many branches of 
mathematics. Unfortunately, the deduction of this algorithm is 
necessarily somewhat complicated, since one must obtain the se- 
quences { P;,} and {Q;,} from three others yet to be defined: {az}, {zx}, 
and {\;}. The final result, however, is quite simple. 

If Po/Qo = a;/bj, the relation 


(Q.Pe_1 — Qui Px| = 1 k= 0) Lace, (3) 
holds, by (1) and (2). If Po/Qo ¥ a;/b,, then 
Pr _ Po _ oie: 
Q, oe QO = Q1 = 1, 
and 
Q1Po — QPi| = 1, (4) 


so that (3) again holds, by (1), (2), and (4). The relation (8) is 
therefore always valid. 

The numbers P; and Q; are now defined recursively, as follows: 
P35 a 1, Q_1 = 0, Po = [zx], Qo = 1, and, for k = 1, P, and QO: cone 
stitute that solution p, q of the inequality 


164 RATIONAL APPROXIMATIONS TO REAL NUMBERS _[cuap. 9 
lax — pl < |Qeiz — Pr_1l 


for which g is positive and minimal. If we put a, = Q.x — P, for 
k =0,1,..., we must find the minimal solution of the inequality 


lox = pl < lay). 


Fortunately, we need not consider all pairs p, g, but only those for 
which 
\Pr1q a Qr_-1p| = 1; 


on account of (8). Since we know that one solution of this equation 
is q = Qr_2, p = Pr_2, it follows that every solution is of the form 


q=€(QeotAQe1), p= e(Pro+APr-1), (5) 
where e = -+1 and J is an integer, so that 
lqx = pl i= IN(Qy12 — Pps) + (Qp-or — Py_2)| = Noy + ong. 


Thus we can rephrase the definition of P, and Q;: if k >1 and 
P; and Q; are known for | < k, then P;, Q; are the p and q of equa- 
tions (5) if \ and ¢ are so determined that 


[Nar + ox—ol < laz_s|; €(AQz_1 + Q,_2) is positive and minimal. 


Since Qz_1 > 0, these conditions are equivalent to the following: 


re 
Ak—1 


To see how to solve (6;), let us consider the case k = 1. We have 


v2) positive and minimal. (6) 
Qn-1 


<1; da — — 


aj =0-r-—1= —-l, a = 1-2 — [2], 


so that (6;,) becomes 


hb - 


0 mie ed 
eae <1; € ( — ) positive and minimal. (61) 


1 


The number r= 
« — [2] 


is larger than 1, so that the two integral solutions \ of the inequality 
of (6,) are positive; the solution of (6;) is clearly 


A= Mi = [a], e= +1. 


9-1] INTRODUCTION 165 
This gives P,=iPp9+P_1, Q1 = 1920+ Q41, 
Q1Po — QoP1 = (A190 + Q-1)Po — Qo(1Po + P_1) 
= —(QoP_1 — Q-1Po), 

and since QoP_1 — Q-1P)9 = 1-1—0°-% = 1, 
we have Q,Po — QoP 1 = —]. 

These calculations provide a basis for an inductive proof of the 
following theorem. 

THEOREM 9-1. Pui 2 = x, and define 

1 


1, = ———_-_—— 
Lr1 — [Les] 


fori<k<n+1, 
where n ts the smallest index for which x, — [xn] = 0, tf there is such, 
and ts infinity otherwise. Thenforl1<k<n-+], 


4=—-——) (7) 


and the solution d,€ of (6%) ws X = Ay = [zx], € = +1. Hence 
{P,/Qz} ts recursively defined by the equations 


Ps»+=1, Po=Xo, Pr= Prin + Pro 
Q1=0, QM=1, Qe = Qeire + Qr-2 
and = QePu-s — Qe-iPx = (-1)* for0<k<n+1. (9) 


Proof: As we have just seen, all the assertions of the theorem are 
true when k = 1, and (9) holds for k = 0. Suppose the assertions 
true for some k < n + 1, and for all indices smaller than this k. We 
wish to determine Py; and Qz41 by solving (6,41). 

If n is finite and k+1l1=n+1, then x, — [x] = ry — rx 
= (0. Reversing the argument that led to (6;), this means that 
Q.x — P, = 0, or that + = P,/Qn, and the sequence {P;/Q;,} termi- 
nates with P,,/Q,. Thus the entire sequence of best approximations 
has already been determined when k = n, so that we need only con- 
sider the case k < n. 


If k < n, we must solve 
<1; €¢ p — (- 2-1) positiveand minimal. (6,41) 


ers, Ox 


From the induction hypothesis and the definition of 2,41, we have 


forol<k<n+l1, (8) 


166 RATIONAL APPROXIMATIONS TO REAL NUMBERS [cuap. 9 


_ Oh k—1 
Ok Quiz — Py (Qr_-1Ak + Qr—2)% — (Pride + Pr—e) 
Ak—] 1 1 
On—-1\k + Ong —ap_o/oap-1— Age Ze — [rel ore 


Since —Qz-1/Q:x < 0, the solution of (6,41) is clearly 
A= Angi = [rez], e= +1, 
whence Qeii = Qedng1 + Qa, = Prega = Prdcgi + Pre-a- 
Moreover, 
QeriPr — VWePror = (QerArza + Qe-1)Pr — Qe(Prdrea + Pr) 
— (QePx—1 — Qu—iPr) = (—1)*", 
by the induction hypothesis, and the proof is complete. 


To see how Theorem 9-1 solves the problem of finding the best 
approximations to z, taker = V2. Then Xr» = [V2] = ] and 


vel ae i = [V2 + 1] = 2, 


1 
m= tet Vth he = [V2 4+ 1] = 2, 


and in general, 2, = V 2 + 1 and \;, = 2, fork > 1. Hence 
P_, = 1, Po = 1, Py = 2Pxr1 + Pre, 
so that {P,} = 1,1, 3, 7,17, ..., and 
Q1 = 0, Qo = 1, Qt = 2Qr-1 + Qr-2, 
so that {Q,} = 0, 1, 2,5, 12,.... 
Thus the best approximations to V 2 are 
37 
25) 120 
Of course, not every x will give a constant sequence of )’s, as 


happens with V2. In general, while arbitrarily many \’s can be 
determined, no explicit (i.e., nonrecursive) formula for the entire 
sequence can be given. 


9-1} INTRODUCTION 167 


THEOREM 9-2. If {x,}, {dx}, {Px}, and {Qz} are determined as in 
Theorem 9-1, then the following relations hold: 


Pr_-12%% + Pre 
— Peoitet Prag cok cn +1, 10) 
: Qu—1t% + Qr—2 
b= dy +} —__+___—_ 1<k<n+1, (il) 
x ee 
uae 
: 1 
+ 1 
’ a es le 
Lk 
ete | Rad, 
Q: Xb 1 
: Ag + 
° 1 
a 
Mary 


Proof: From (7) and the definition of the a’s, we obtain 
— Qr-2x — Pre 

Qpit — Pr» 
which yields (10). The definition of {x;,} and {;} gives 


Le = 


MP a dees 
ty 
1 

m1 =M+—> (13) 
oo) 


Xe-1 = Ne-1 + —> 
Lk 
and if we successively eliminate 2,,22...,2%-1, equation (11) 
results. To obtain (12), consider the equations (13) with 2, as an 


168 RATIONAL APPROXIMATIONS TO REAL NUMBERS [cHAP. 9 


independent variable which assumes values greater than 1, with 


fixed Ao, A1,...,Ax¢—1- Then z is a function of x;, given more briefly 
by (11). Since Py_1, Peo, Qe-1, and Qs» depend only on 
No,---,Az-1, (10) and (11) are different expressions for the same 


functional relation. Ifin (10) 2; is given the value Ax, then x = P;/Qx, 
and substituting these values in (11) gives (12). 


PROBLEMS 


1. Carry through the procedure described in this section to find the first 
few best approximations to V3. 

2. Find all the best approximations to 339/62. 

3. Show that if 7 = $(1 + V/ 5), then each ; is 1. 


9-2 The rational case. We now suppose that xis rational. If x is 
an interior point of Rg,(Pr,Q.), we have the strict inequality 
lQut — Px| < |Qz-1% — Pr_il, or [Ax — t4| <1. It may happen, 
however, that for some r/s with s > Q;_1, z 1s the common end point 
of the abutting intervals R;,(Px_1, Qz-1) and R;(r, s), while x is an 
interior point of R,_1(Px_1, Qe-1). In this case, it is a matter of 
choice whether r/s is to be included among the best approximations 
to x; it has not been included up to now, since we have required the 
strict inequality |A; — z,| <1. Fortunately, this ambiguity occurs 
only once for each z, for we know from earlier calculations that z is 
the mediant of P;,_1/Q,—1 and r/s: 


P 
pple Pea tr (14) 
Qr-1 + 8 
so that z is the first rational number to appear between P;,_1/Q;,_1 and 
r/s in the sequences F, Fs41,.... Hence k = n, and if r/s is to be 


included among the best approximations, and if we put r/s = P,/Qn, 
then Prii/Qr41 = x. Comparing equations (8) and (14), we see 
that An+1 = I, and by (12), 


Be a 


1 


9-2) THE RATIONAL CASE 


If r/s is not included, then x = P,/Q, and 


1 
ea eg on 
. ‘ 
(An + 1) 
To illustrate, take x = 2. Then 
; 1 3 : 1 
—_— — i | — 
‘3-8 2% 3-8 
ro = [¥] = 0, Ar = [xm] = 1, A2 = [Ze] = 2, 
and we have 
2 1 Po 0 Py 1 
ce ee a oe ee 
5 8 Ged a =] 
But we could also write 
2 1 
os 1’ 
Or ead 
1 
: Py O Py 1 Py, 1 
qi =[%25, -2ae, 222 
@ 1 Ga 1 @& 2 


169 


In this case, ¢ is the right end point of R2(1, 2) and the left end point 
of Re(1, 1); with the normal procedure, $ would not be included 


among the best approximations to 3. 
An expression 


(15) 


170 RATIONAL APPROXIMATIONS TO REAL NUMBERS  [cHap. 9 


is called a finite regular continued fraction; it is finite because there are 
only finitely many a’s, and regular because the a’s are integers, 
@1,...,Q, are positive, and the numerators are all +1. We shall deal 
only with regular continued fractions in this book. For typographical 
simplicity, we write 


1 1 1 
pe es 16 
a Qy + ag+ An e 
in place of (15). The numbers 
1 1 1 
Bias. he ee a es —"9 
do q1 ay Qo a, + a> 


are called the convergenis of the continued fraction (16). If (16) has 
the value x, we can put 


4 1 1 1 
x — a oo  _—— oT ESE —E 
0 ay + Qy1 + 2% 
where 
1 1 1 
ty! = ay + aay or ty =a, t+ 7 ’ 
Ak+1 + an Th+1 
fork = 1,2,...,n. Since x,’ > 1, we have 
1 1 
xy’ = = ay [x1'], 
Z—-A «r-— [zl 
1 1 
7 
Lo = = ? a HF 
. xy’ — ay ty — [x1] : iva |, 


Thus the sequence {2;’} is identical with the sequence {z;} defined in 
Theorem 9-1, and {a;} is therefore identical with {\;}. Hence we 
have the following theorem. 


THEOREM 9-3. The convergents (possibly excepting po/qo) of any 
finite regular continued fraction are the best approximations to the 
value of the continued fraction. Every rational number can be ex- 
panded into a finite regular continued fraction, and this expansion 1s 
unique, except for the variation indicated by the identity 


1 1 1 1 1 


9-2] THE RATIONAL CASE 171 
Moreover, the identities 
(a) Po = Go, De = Pe—12x + Pr—z; 


(b) go = 1, Qe = Qe—-10% + Qr—2; 
(c) QePe-1 — Ge-1Per = (—1)*, 


(d) pr Pk—12k + Pk—2 


Qu—-1Xe + Gre 
hold fork = 1,2,...n, if we define p_1 = 1, g-1 = 0, and 
1 1 
= see, aia A. 
ea ay + An es 


It might be worth mentioning that the continued fraction expansion 


of « = p/q can be deduced immediately from the Euclidean algorithm 
as applied to a and b. For if 


a= bao ie TO, 
b= ToQ1 + Ti; 


To = 1142 + 12, 


Tr-3 = Tr—2On-1 + Tn—1) 


th s ae oe 
en —-=4@ —= —— 
b 9 bl B/to 
b T1 i 
—_— = a + =< a ? 
To To ro/T1 
r ‘g 
=a, + =a,t+ 
| T1 71/T2 
Tn— n— ] 
S Saas BS Gog te ? 
Tn—2 Tn—-2 Tn—2/ Tn—1 
Tn—2 
=— Qn, 
Tn-1 
1 1 1 
so that 


172 RATIONAL APPROXIMATIONS TO REAL NUMBERS _[cnap. 9 


PROBLEMS 


1. Prove that the convergents p,/q, are in reduced form, i.e., that 


(Pe, %) = 1. 
2. If x = a/b, where (a, 6) = 1, and 


then pn, = a, gn = 6. Use this and an identity of Theorem 9-3 to find a 
solution of the linear Diophantine equation az + by = c. In particular, 
find a general solution of 247z + 77y = 31. 

3. Show that for k < n, 


ke 
Qk—1 Qn—1 + Qo+ a 


9-3 The irrational case. Now consider the case that z is an 
irrational number & The sequences {zz}, {Ax}, and {P,/Q,} are now 
infinite, and we write 


1 
=) ee 17 
E ie ear 7 ~ ae (17) 
This equation must be understood as an abbreviation for the equation 
1 P;, 
= lim {A ee =i 18 
: lim ( ay ae? An n—> © QO,” ( ) 


the convergents P,/Q, play a role here analogous to that of the partial 
sums of an infinite series. 

Conversely, if we start with an arbitrary infinite regular continued 
fraction 


1 1 
ee 19 
0 a a ae A> rs ) ( ) 
we can show that the convergents 
1 1 1 
PO = a, PL ag t+—> P2 = ay + or ts 
qo qi ay g2 a1 + de 
always converge to an irrational number é For take n > 2 and put 
n 1 1 
en 2 eee ee 


9-3] THE IRRATIONAL CASE 173 


Then by Theorem 9-1, the numbers do, a1,...,@n—2 are uniquely 
determined by p/g, and the convergents of p/g, which are also con- 
vergents of (19), satisfy the usual recursion relations: 


p-1 = 1, Po = Qo, Pe = Pr—12k + Dre, (20) 
q_1 = 0, do = 1, Qk = Qe-10x% + Qr—o, 


fork = 1,2,...,%—2. Moreover, 
QkPk—1 — Qe—-1Pk = (Qk—19k + Qr—2)Pr—1 — Qe—1(PR-14% + Pe_o) 
= — (Qk-1Pr_-2 — Qx—2Pr—1) = ° °° 


= (—1)*(qop_1 — q_190), 
so that 


QkPk-1 — Qk-1Pk = (—1)* (21) 


fork =0,1,...,n—2. Since n is arbitrary, the relations (20) 
and (21) hold for allk > 1. By (21), 


UPk—2 — Qe—-oPk = (We-i%% + Gi—2)Pk-2 — Qe—-2(Pr10z + Deo) 


= A, (r—1Pk—2 - Qk—2Pr—1 ) 
4 ( — 1 yF-lay, 


(22) 
From (22), we see that 


P2ok—2 < P2k Pok—1 > Pok+1 


d2k—2 2k q2k—1 d2k+1 
So that 
Lt cee a PL Pa PB, 
do d2 q4 q1 q3 q5 
By (21), 
Pek < Pok+1 
Q2k = Qok44 
So that 
P2k ec Patt 
qok 92141 


for every 1 > k. Hence 


do Gd 44 %® qa MQ 


174 RATIONAL APPROXIMATIONS TO REAL NUMBERS  [cwap. 9 


so that the sequences {pzx/qox} and {p2x41/qox+1}, being monotonic 
and bounded, are convergent. But (21) can be rewritten in the form 


Pi-1 Pk _ (—1)* 
Qk—-1 Qk de-19 
and since g, > © ask— © we see that 


lim (= — Patt = 0 
Gok Yok+i1/ 


k— © 


and consequently lim p;/q, exists. Call this limit £, and put 


C8 0 ee, 0 “9g 


It follows, just as in the rational case, that 

1 _ 1 

“ol eeu 
and Q = [t], a, = [£1], S845 


so that the convergents p;/q, of (19) are the best approximations 
P;,/Q; to &. From this we deduce the following assertions. 


I 


THEOREM 9-4. Every infinite regular continued fraction converges 
to an irrational number, the best approximations to which are afforded 
by the convergents of the continued fraction. Every trrational number 
can be expanded into an infinite regular continued fraction, and this 
expansion is unique. Moreover, the following identities hold, tf 


i = Il, = Qo, = —10% eae 
Pp-1 P1 0, Dk = Pr—19k T Pk—2 k=1,2,3,..., (23) 


g1=0, gm=1, Qe = Q-10% + Qk-2; 


QuPk—1 — Qe—1Pe = (—1)*, k= 0,1,2,..., (24) 
GkPk—2 — Qk—2PDk = (—1)*—1ax, k= 1, 2; 3, sty (25) 
Pxr—1ék + Pr—2 1 1 1 
= ——————; where = A) + ooo 
quite + qe2 a aia 


9-3] THE IRRATIONAL CASE 175 


The numbers a; are called the partial quotients, and the & the 
complete quotients, in the expansion. 
Once the continued fraction expansion of ¢ is known, the successive 


convergents can be computed very simply. For example, let & = JT. 
Then 


V7 =2+4 (v7 — 2), a =2, & = (V7 —-2)7, 
a ME, pee a-()", 
3 = YEH MEI, pee p= (4), 
3 = VTH1 8 4 VIE? ak a= (“) 
or =V742=44+(V7—-2), a =4, & = (V7 —-2)". 
Since £5 = £, also && = fo, &7 = &3,..., So {&} (and therefore also 


{a%}) is periodic. Thus we have the periodic expansion 


g | 0 |1}1{/2!3]141 17] 31 


Here the element 37 = 7,4, for example, is determined by multiplying 
a, = 4 by p3 = 8 and adding p. = 5. Thus the best approximations 
to V7 are 3, &, ee ee 


176 RATIONAL APPROXIMATIONS TO REAL NUMBERS [cnap. 9 


PROBLEMS 


1. It can be verified, using a sufficiently good decimal approximation to 
a, that the beginning of the continued fraction expansion for 7 is 


Find the first four best approximations to 7. 


2. Show that V/13 has an expansion which is eventually periodic, and 
find the first few convergents. 


9-4 Quadratic irrationalities. The problem of finding the best 
approximations to a real number x has thus been completely solved 
in terms of the regular continued fraction expansion of x. Of course, 
unless x is of a very special form, it may be impossible to give the 
complete expansion of x, just as one cannot give the rule of formation 
for the digits occurring in the decimal expansion of 7. But if a deci- 
mal approximation of x is known, a corresponding part of the con- 
tinued fraction expansion of x can be determined quite easily. For 
example, from the series expansion for e, one can easily show that 

2.7182 < e < 2.7188. 
By a simple computation, we find that 
1 1 1 1 21 21 1 TY 1 1 
AT iy dh el a 
1+ 24+14+14+4+14+14+14+34149 
1 1 21 1 +21 1 =1 1 1 141 


hee Sees ese re rs ee es | 


sothat e = 2+ — — — — — — —.-.:-- 


(Actually, it is known that the sequence of partial quotients is 
2, 1, 2, 1, 1, 4, 1, 1, 6, 1, ¥ 1, 2n, 1, amas) 
There is, however, one simple case in which the complete expansion 
can be determined: that in which the partial quotients dg, a, do, . 


constitute a sequence which is eventually periodic. Consider for 
example the continued fraction 


9-4] QUADRATIC IRRATIONALITIES 177 


where don = 1 and deni; = 2forn > 1. If é is related to & as in 
(26), we have 


1 1 1 1 |i 
Be Bs th a ery ch oe 
Sl toyi¢2+ Tot b 
so that 
1 fo df +1 
ee eo See | - 
s ae 1 TOR +1 2 +1 
+- — 
fo 
2é,” — 2% —1=0, 
Pe sn 
_ 2 
(The plus sign is chosen since & > 0.) Hence 
1 4V/3 — a w/8 
bei: _4V38-2 = 17- V3. 
34. 3V3—1 13 
V3 —1 
2 


Conversely, if we start with a quadratic irrationality—say § = 1 + 
/6—we get 


dy = [1+ V6] =3, 


pel _Vv6+2 - (A? -2 
1 VYe—-2 9 ’ a, = 9 = 4, 
1 2 
= = > = V6 4-2, = 4, 
V6 +2 V6 — ™ 
2 
1 
Or eg 


We can now show that these are not isolated phenomena. 


178 RATIONAL APPROXIMATIONS TO REAL NUMBERS [cHAP. 9 


THEOREM 9-5. Every eventually periodic regular continued fraction 
converges to a quadratic irrationality, and every quadratic trration- 
ality has a regular continued fraction expansion which is eventually 
periodic. 


Proof: The first part is quite simple. Suppose that the first period 
begins with a,, and let the length of the period be h, so that 
pin = A, fork >n. Put 


1 
oe and =e ies 
ay + - " Anta + 
so that in = & for k>n. By this and equation (26), 
_ Pn—1én + Pn—2 ~ Pnth—1én + Pnth—2 
Qn—1En + Qn—2 Qn -h—1§n + Anno 


so that &, satisfies a quadratic equation with integral coefficients. 
Since £, is obviously not rational, it is a quadratic irrationality. By 
(26) again, the same is true of £ itself, since if 


Aé,” + Bé,, + C= 0, 


— =a) + 


then 
A (—@n—2é + DPn—2)* + B(—qn—2é + Pn—2) (Qn—1é a Pn—1) 
ala C (Qn—né _ Pn—1)” = 0, 


and this is a quadratic equation in &. 


The proof of the converse involves a little more computation. 
Suppose that 


A# + BE+C =0, 


where A, B, and C are integers, and ¢ is irrational. Then equation 
(26) gives 


A (pr_itk + Pr_o)® + B(prate + pee) (Qe—rte + Gk—2) 
+ C(qrié + G2)? = 0, 
or Aptx” + Brée + Cy = 0, 
where the integers A;,, B,, and C; are given by the equations 
Ay = Apia + Bpp_aqe1 + Caz, 
B, = 2Apr1Pr_2 + B(pr_-1ge—2 + Pr—29k—-1) + 2C qu—19h-2, 
Cy = Api_2 + Bpr-2qr—2 + Cots. 


9-4] QUADRATIC IRRATIONALITIES 


179 
If f(z) = Az? + Br + C, then 
Pk—1 
Ay = das (P= P=), Cy = ge (2 
tu k= Gof < 
Using Taylor’s theorem, we have 
Ax = ge-1 [re +7°@ (2 - ) fyi (2 ‘ 
| dk-1 g + 5/ (é) re , 
since f’’’ (x) is identically zero. Now f(¢) = 0, and 
p — Pea _ Prise T Pea Phi Ge-1 Dea. — Vk-2Dp_y 
Ge-1 ede + Oe—2 ka Qe (Que + Tk») 
7 (— 1 je 
Qk—1 (Qr—1&e + Qu_e) ’ (27) 
since & > 1, it is certainly true that 
Pk—1 1 
So 
Qk—1 Qr—1 


Hence 


Ad < ir’'(@| + Lo f oe, 
and similarly 


Eo oe 


so that |A;| and |C;| remain bounded as k > ©. 

To see that |B;| is also bounded, we use the fact that all the quan- 
tities B,” — 4A,C;, have the common value B? — 4AC = D. (This 
can be proved by a straightforward but tedious computation or, if 
one is acquainted with the theory of linear transformations, by noting 
that the expression A,r’? + B,xr’y’ + Cy’? is obtained from Ax? + 
Bry + Cy” by the unimodular substitution 


IG] << If Ol +s 


Z = pert’ + poy’, 
y = grit’ + quay’, 


and that two such forms have the same discriminant.) Since A; and 
C;, are bounded and D is fixed, 


BB, = D+ 4A;C; 
must be bounded also. 


180 RATIONAL APPROXIMATIONS TO REAL NUMBERS _[caap. 9 
Thus, there is a constant M such that 


for all k. Since there are fewer than (2M + 1)° triples of integers 
numerically smaller than M, there must be three indices, say 71, No, 
and 73, which give the same triple: 


An, = An, = Any Bn, = Bry = Bay Cny = Cre = Crs. 


Since the equation A,,7” + B,,2 + Cr, = 0 has only two roots, two 
of the numbers £,,, ne, £n3 must be equal; with proper naming, they 
can be taken to be &,, and &n., where ny < no. If no — ny = h, then 


Enith = §,, and i? 
3 : : £ 
nith Se SSS SS e—E ‘ 
ae Enith _ [En +h En, . [En] oe 
1 1 
En th+2 Se ge ee ee ae ee a ee Eni+2) 


and in general, 4, = & fork >m,. Thus the &;,’s are eventually 
periodic. Since each a; is determined exclusively by the corresponding 
£,, the same is true of the a;’s, and the proof is complete. 


The relation (27) is of course valid for general £, although it was 


used above only when é is a quadratic irrationality. It provides a 
proof of the following assertions. 


THEOREM 9-6. If px/q~ is a convergent of the continued fraction 
expansion of &, then 


—1)- 
a. een cs!) ae (28) 
Qe =k (Quén4a + Qe-1) 
A forttorz, ——_—_-—-—_—_——. eine ei 
Qk (Qu + 9x41) Qi} ke+t 
1 
and ee (29) 
Qk| Qk 
As a partial converse, we have 
THEOREM 9-7. If —_ P < ee (30) 
q| — 2¢° 


then p/q 1s a convergent of the continued fraction expansion of &. 


9-5] APPLICATION TO PELL’S EQUATION 181 


Proof: If p/qis adjacent to r/s in F,, then the end point of R,(p, g) 
lying between p/g and 7/s is the mediant 


por 
gts 
ee 3 qr — ps 1 1 
and —<—<—$ | ——_—_—- > — 
q+s q} |aq+s) = Gea 2g? 
Hence, if (80) holds, either 
Peg Pat ct or rg aa re a 
q q+s Ss qts q 


and £€ R,(p, g), so that p/g is a best approximation, and therefore a 
convergent, to é. 


PROBLEMS 


1. Below is an outline of a proof that the expansion of Vd (d a positive: 
nonsquare integer) is periodic after ao. Fillin all details. (Ifa =r+ sv d, 
where r and s are rational, then &@ = r — sVd.) 

Put § = Vd+ [Vd]. Then —1 < =< 0, and from the equation 


& = a, + —— 
be 
it follows that —1< &< 0 for k>1. This in turn shows that a, = 
(-1/&.4:]. Now suppose that the periodicity of {£} begins when k = n, 
and that the period is of length h, so that —, = &4,. It follows that 
Qn—1 = Gnyzn—1, and hence that ,-1 = &,42-1, so that {&} is periodic from 
the beginning. 
2. It is a consequence of Theorem 8-1 that if is irrational, then to each 
positive integer ¢ there corresponds at least one pair of integers z, y such 
that 


<-> l<2z<t. 


MY 


Show that this becomes false, for any irrational € and infinitely many f, 
if the second inequality above is replaced by 1 <2 < t/2. [Hint: Take 
= g. + x41, and use Theorems 9-7 and 9-6.] 


9-5 Application to Pell’s equation 


THEoREM 9-8. If N and dare integers with d > O and |N| < V4, 
and d is not a square, then all positive solutions of the Pell equation 


x? — dy? = N (31) 


182 RATIONAL APPROXIMATIONS TO REAL NUMBERS [cHaP. 9 


are such that x/y is a convergent of the continued fraction expansion 


of Vd. 


Proof: Suppose that «+ yV. dis a positive solution of (81). 
Then, if N is positive, 


d 1 
ee en ee ne ane 
atyVd «t+ yVd x x 
+y y +1 
d yvd 
Since z/y > Vd, we have 
x 1 
Vd — | or (32) 
If N is negative, we deduce from the equation 
2 
2_ % _=-—N 
ua od 
the relations 
x —N/d 1 1 
0<y- —s = XK ’ 
Va yee eae yVd + x eli yvd 
Vad 
and 
1 Yy 1 
Vd 2 < o2 ae 
Now if 
i a, 
then 
1 1 1 


Vd ee ' 


so that the convergents of the continued fraction expansion of 1/ Vd 
are 0/1 and the reciprocals of the convergents of the continued frac- 
tion expansion of Vd. Using this, the inequalities (32) and (33), 
and Theorem 9-7, we have the result. 

This theorem provides an effective method of determining all 


integers NV, numerically smaller than Vd, for which equation (31) 


9-5] APPLICATION TO PELL’S EQUATION 183 


is solvable, for it happens that the sequence {p,7 — dq;,} is eventually 
periodic, and consequently only finitely many values of k& need to be 
examined. To see this, put ¢ = Vd and 
Sg Pride + Pk—2 | 
Qx—1ék + Qk-2 
Solving for & and rationalizing the denominator, we can write 
= Vd + rz 
Sk 


where 7; and s, are rational numbers. Substituting this back into 
(34), and replacing k by k + 1 throughout, we have 


(34) 


£ 


eo pe(Vd + regi) + Pe—18k41 


gu( Vd + Tei) + Qe—1Skeq1 
or 


(Geka + e—18e41 — De)WVd — (Pe—1Sk41 + Perera — Qed) = 0. 
The rational and irrational parts must separately be zero, so 
Ure + Ue-18k+1 = Dk, Peet + Pe—iSk41 = xd. 
The determinant of this system is qxDr_-1 — Qz—1Pz = (—1)*, so that 


Thai = (—1)*(pepe_1 — 9n9%-14), 
Sep = (—1)*(qi2d — p,”). 


Now the numbers r; and s; are uniquely determined by &; since 
{&,} is eventually periodic, the same is true of {s;,}, and the eventual 
periodicity of {p,7 — dq,”} follows from the second equation of (35). 

The discussion of Pell’s equation with N = +1, in Chapter 8, had 
the serious drawback that no effective method was given for finding 
the fundamental solution, nor even of deciding when one exists for 
N= -1. The results obtained above entirely clarify these points: 
the first solution encountered, being the smallest, is the fundamental 
solution, and the equation x” — dy? = —1 is solvable if and only if a 
solution exists among the convergents to Wd up to the end of the 
second period. (For {s,} becomes periodic at the same point as 
{t.}, and {(—1)*!s,} has period at most twice that of {s,}.) It can 
be shown by the method sketched in Problem 4, below, that s, =1 
for the first time at the end of the first period, so that the preceding 


(35) 


184 RATIONAL APPROXIMATIONS TO REAL NUMBERS _[cHap. 9 


convergent is the fundamental solution of one of the equations; if 


for this convergent p;” — dq,” = 1, then the equation x? — dy? = —1 

is unsolvable, while if p,”7 — dg,” = —1, the convergents preceding 

ends of periods are alternately solutions for N = —1l and N = 1. 
PROBLEMS 


1. For what N with |N| < V7 is the equation z? — 7y? = N solvable? 

2. Show that the numbers 7; and s; defined in this section are positive 
integers. [Hint: Use equation (28).] 

3. Find the fundamental solution of 7? — 95y? = 1; of 2? — 74y? = 1. 

4. (a) Using Problem 1, Section 9-4, show that if the length of the period 


in the expansion of Vd is h, then s,; = 1, and hence that 
Pri? — dgri? = (—1)*. 


Thus 2? — dy? = —1 is solvable if h is odd. 

(b) Using the fact that the numbers &, £1, ..., &:-1 are distinct, show 
thats, > 1lif1l<k<h-—1. Deduce that the equation x? — dy? = —1 
is solvable only if h is odd. 


9-6 Equivalence of numbers. Because each element of the 
sequence {,} depends only on the preceding one, and because the 
defining rule 


1 
& = [&] + —— 
Ee 


is the same for all k > 1, it is clear that if 


—&=‘Adg + i sh ice ios ee et i : 
ay + Gn1 + &n ay + 
then 
1 
En ot 


If we are interested in the possibility of finding infinitely many solu- 
tions of the inequality 


aed 


og” 


equation (28) shows that we need only examine the numbers &; with 
large k. For this reason, we shall term two irrational numbers ¢ 


9-6] EQUIVALENCE OF NUMBERS 185 


and é equivalent if, for some j and k, #’; = & Then £’j4m = Eetm 
for m = 0, and by the above remark, this means that if 


=A, + Oe ee. 
: ay + Op—1 t+ OR + Oey + 
then 
1 1 1 1 
t’ = by + een eng ee eee 
bp bya + ae + Gey + 
so that 
_ Pr_ikk + Pr—2 / _ Prk a Pi 
 Qn_-1ke + fe git + qj-2 


THEOREM 9-9. Two irrational numbers & and &’ are equivalent, 
in the sense that their continued fraction expansions are identical 


from some points on, if and only if there are integers A, B, C, and 


D such that 
_ Ag+ B 


Ce4 D +p’ where AD — BC = +1. (36) 


Proof: Eliminating & from the equations preceding the theorem 
gives 
—qn-o& + Pe_o —5-2&’ + Pie , 


— 


Gk—1—& — Pr—i git — Dia 


or 
ie At +B +B 
Ce’ + Ce + iD? 
where 
A= Pe-19 2 = Ph-295-1, B= Pk—-2P 3-1 = Pk-1P j-2; 
/ / / / 
C = ge-19j-2 — Qk—29 j-1) D= Qk—2Pj—1 — Qk—1Pj-2. 


A simple calculation shows that 


AD — BC = (pj—-19j-2 = Pj—-29;-1) (Pk-19k—2 — Pr-2Gr—1) = 


To complete the proof, suppose that equation (36) holds. By 
replacing A, B, C, and D by their negatives if necessary, we may 
suppose also that CE + D> 0. Substituting the value of ¢ from 
equation (26) into (86) gives 

/ ag, tO + b 


ie c&, + ch +a’ oy) 


186 RATIONAL APPROXIMATIONS TO REAL NUMBERS [cHaP. 9 


where 
a= Appi + Bax-1, b6 = Apyo + Bare, 


c= Cori + Dax-1, d = Cpz-2 + Dax-z, 
and 


ad — be = (AD — BC) (pe-1qx-2 — Pe-o—1) = +1. 
By the inequality (29), 


On On 
Pr-i = M%-1§ + Ant ? Pra = Qe—2& + a ’ 
Qk—1 Gk—2 
where 
liza| < 1, \5z-2| <1. 
Hence 
C5 C5 
c= (CE+D yit——, d= (C§+Dan2+——: 
Qk—~1 dk—2 


Since CE + D, qy_i, and gqx_e are positive, and since gzx_1 > gro 
and gq, — © with k, we have c > d > 0 for k sufficiently large. But 
by Theorem 8-14, this means that a/c and b/d are adjacent in F,, 
and from (37) and the fact that & > 1, it is seen that ¢’ lies between 
a/c and b/d, and is closer to a/c than is the mediant (a + b)/(c + d). 
It follows that &’ < R._1(b, d) and ¢’ € R,(a, c), so that b/d and.a/c are 
successive convergents of the continued fraction expansion of ¢’: 


a=pr1, b=Pre, C=gh1, d= Qype. 
But from 
/ / / / i 
f= Pike + Pp2 _ Pyaikj + Pie 
qj-rtk + Q;-2 93-18 ;’ + aie 
it follows that & = &;’, as was to be proved. 
In the course of the proof, the following useful fact emerged: 


THEOREM 9-10. If a, 6, c, and d are integers, and 
_ ati +b 

7 ct’ +d 
then b/d and a/c are successive convergents of the continued fraction 


expansion of §, and t’ is the corresponding complete quotient: for 
suitable k, 


f ad—be=+1, ¢>1, c>d>Q0O, 


a = Pr-1) b= Pk—2, C = Qk-1; d= Gk—2; ¢ = &. 


9-6] EQUIVALENCE OF NUMBERS 187 


We shall use the symbol ““”’ to designate equivalence in the regular 
continued fraction sense. The notion of equivalence, together with 
equation (28), can be used to gain new insight concerning the Markov 
constant M(£), which was defined in Section 8-4 as the upper limit 
of those numbers A such that the inequality 


has infinitely many solutions p,q. From (28), it is clear that 
lax? (€ — pe/gz)| is approximately inversely proportional to &, so 
that M(€) will probably have its smallest value for those é for which 
a; = 1 for all large k. Now if 


1 
eee 
then bolts padeeren 


These remarks lead one to expect that the first part of the following 
theorem might be true. 


THeoREM 9-11. If éa#’, then M(é)=M(’). If §& 
(1 + V5)/2, then M(t) = V5. If £%s irrational and not equivalent 
to (1 + V5)/2, then M(t) > V8. If = V2, then M(E) = V8. 
If & is not equivalent to either (1 + /5)/2 or V2, and zs trrational, 
then M(£) > 17/6. 


Proof: By (28), M(é) = lim sup (Ee ai =) 
k—> © 
N =a@ +- se ove F 
ow En+1 k-+1 pig 4 
and 
Qk—-1 _ Qk—1 eS eee ees, ee 
= Qa, + 
dk Qk—18k + Qk—2 ah Qk—2 k apace Qk—3 
Vk—1 Qk—2 
1 J eg te al 
Qe + Qp-1 + ee ay + ay 


188 RATIONAL APPROXIMATIONS TO REAL NUMBERS _[cHap. 9 
so that 


1 1 1 1 
M = hi otc os: ees e 
— ae Ges dpi " (aust Anye+ ) 


If ¢’ =, then é;’ = £, and a; = dy for all sufficiently large j and k 
for which J — k has a suitable fixed value h. If the convergents of ¢’ 
are p;'/q;, then for such j and k the continued fraction expansions of 
Qe—-1/Qk and q ‘ey, q; have the same partial quotients at the beginning, 
and the interval of agreement can be made arbitrarily long by choos- 
ing 7 and k sufficiently large. Suppose that they agree in the first 
1+ 1 partial quotients, that r,/s;(¢ = 0,1,...,l) are the common 
convergents, and that 


Qk—-1 _ TI-101 + Tie qj-1 ray + T12 
eee ee Sa Aa See 
qk Si_1a1 + Sj_2 Gi $1107. + Ssi_e 

Then using the fact that [a;] = [a;’] > 1, we have 


|axz — ay’ | 1 


Qk—-1 qj-1 < 


Qk q3, 7 (sr_1az + Sz_2) (sy__yay’ + sie) ~ Si_-4 
so that 
a f q q 
ko qj k— @ qk qj 
j—-k=h j—k=h 


and so M(t) = M(?’). 
To prove the second assertion of Theorem 9-11, we need only 
notice that 


1 
qf OY. im {(1+ 4 (AAD 
2 bee 1+ a eee 1 
———__—_—_-_--__—_——_" 
k terms 
_lt+v5 1 J 
2 "a+v5)/2° ~™ 
by (88). 
To prove the third part, we may suppose that a,4, > 2 for infinitely 
many indices k. If a44,; > 3 for infinitely many k, it is clear that 


M(é) = 3. Since V8 < 3, we need only consider those é’s for which 
a, is either 1 or 2 for all large k. If there are infinitely many 1’s and 
2’s, there are infinitely many values of k such that a, = 1, ax41 = 2. 


9-6) EQUIVALENCE OF NUMBERS 189 


But then, since the value of a continued fraction is always at least 
equal to its convergent with index 2, 


1 7 
Ona + i 2 ep eS Se ee 
a -+ 1 1 8 
k+-2 ae ae aT 
and 
ae Tee a es eg 
ay + Oya + ay 14 1 rae 2 
Ak—1 1 
so that 


M(t) >Z4+4 = 12 = 2.833...> V8. 
On the other hand, if a, = 2 for all large k, then 


1 1 _ 
e@ltotey tv 


and 


1 1 1 1 
m@ = tim (2+97--)+Gpaq- a) 
k terms 
= (V24+ 1) + (V2 —1) = V8. 


To clarify the significance of Theorem 9-11, we make use of the 
concept of countability, introduced by G. Cantor. Let S be an 
arbitrary infinite set. If it is possible to establish a one-to-one cor- 
respondence between the elements of S and the set of positive integers, 
then S is said to be countable. Another formulation of this require- 
ment is that it should be possible to arrange the elements of S in a 
sequence having a first element, second element, and so on, in such a 
way that each element of S occurs only finitely far out in the sequence. 
The integers are countable, since every integer occurs in the sequence 

0,1, —1,2, —2,...,, —n,.... 
The rational numbers between 0 and 1 are also countable, although 
they cannot be arranged by size. A suitable sequence is given by 
112133123 4 


2°3’3’°4’'4’5’5’5’5’ 


190 RATIONAL APPROXIMATIONS TO REAL NUMBERS _ [cuap. 9 


in which the reduced fractions with denominators 2 are listed first, 
then those with denominators 3, etc. On the other hand, the real 
numbers between 0 and 1 are not countable (i.e., the set of such 
numbers is uncountable). For each such number is uniquely repre- 
sented by an infinite decimal (which may consist exclusively of 0’s 
from some point on, but not of 9’s), and conversely. Suppose that 
the set could be arranged in a sequence, say dj, @2,..., and let 
the decimal expansions be 


a, = 0.€11219013 Seely 
ag = 0.d91420003 ade ees 
a3 >= 0.431430033 Shh Sy 


where the a,; are digits. Let b = 0.b,;bob3... be the real number 
determined according to the following rule: forj = 1,2,..., 


( ifa;; #0 
bj = 


1 if ayj = 0. 
Then since 6; ¥ a;;, it is clear that b ¥ a,, and since this is true for 
every j, b is not in the sequence aj, do, .. . , so that the sequence does 


not contain every real number in the interval 0 < x < 1. 

If it can be shown that one set is countable, while another is not, 
then there must be some element of the second set which is not in the 
first. Moreover, every subset of a countable set is countable. 

It is relevant to our present purpose to note that the quadruples 
of integers (A, B, C, D) such that |AD — BC| = 1 are countable. 
For without the restriction, we have the larger set of all quadruples 
of integers, and these can be arranged in a sequence by first writing 
(0, 0, 0, 0), then all quadruples whose elements are 0 or +1, then 
those whose elements are 0, +1 or +2, etc. It follows from Theorem 
9-9 that the set of numbers equivalent to a fixed number is countable, 
and it follows easily from this that the set of numbers equivalent to 
any of a fixed countable set of numbers is itself countable. 

Theorem 9-11 contains the first two of an infinite sequence of 
assertions about the values less than 3 assumed by M(é). Markov 
showed that there are only countably many such values, that their 
sole limit point is 3, and that each such value corresponds precisely 
to the set of numbers equivalent to a certain quadratic irrationality. 


9-6] EQUIVALENCE OF NUMBERS 191 


There is no really simple proof known. For later purposes, we show 
that these results cannot be extended to values M(£) = 3. 


THEOREM 9-12. There are uncountably many numbers & such 
that M(é) = 8. 


Proof: Let 71, 72,... be a strictly increasing sequence of positive 
integers, and let 
Pee eee: Oe ee ne eee See See ee 
at 14242414 14242414 ” 
ry T2 (39) 


where there are 7; partial quotients 1, then two 2’s, then rg 1’s, then 
two 2’s, then r3 1’s, ete. Thus two blocks consisting entirely of 1’s 
are always separated by two 2’s, and the blocks of 1’s become longer 
as we move out in the sequence. Let 


_ Tet —_...) (— u 4), 
Parser tcl Qk (cuss ae + = ay, + Ag-1 + ay 


If we choose k so that az41; = 1, then clearly &41 < 2, q1/q. < 1, 
and 6, < 3. If k runs through a sequence of indices such that a,41 = 
42 = 2, then 


1 1 1 1 1 1 
a-(@+5;7ciq°)+ nee saat 
eT 1 Moe oe 


2+(5-1D2. 2 


while if k runs through a sequence of indices for which a, = az41 = 2, 
then 


“G40 5 )4+(ay5 4) 
Be = pe 1 oe oe ie ee ee | 
1 1 
2 + ——_—_ +. ———___——_- = 8 
(1+ V5)/2 2+ (v5 —1)/2 


Hence M(é) = lim sup 6; = 3. 
To complete the proof, it is required to show that the set of in- 
equivalent £’s defined as in (39) is not countable. Now £ and ¢’ are 


192 RATIONAL APPROXIMATIONS TO REAL NUMBERS _ [cHap. 9 


equivalent if and only if the sequences 7, 72, ... and 11, To, +. 
associated with them are identical from some points on, so that we 
can transfer the notion of equivalence from the numbers é and £’ to 
the sequences {r,} and {r,’/}. Suppose that the inequivalent se- 
quences among all the increasing sequences of positive integers can 
themselves be arranged in a sequence, say Ri, Ro,..., where R; 
stands for the sequence 741, 7i2,..., With ri <ri2<.... With 
proper naming we can suppose that R, is the sequence 1, 2, 3, ... of 
all positive integers in order. If7 > 1, R; is not equivalent to Ry, and 
there are therefore infinitely many positive integers not included in it. 

For 7 > 1 let S; = {sz} be the sequence complementary to R;; 
that is, the positive integers, ordered by size, which do not occur in R;. 
Each S; is an infinite sequence. Now define a sequence T as follows. 
Pick ¢; in Sg, and then successively choose to, é3, . . . So that 


1 € So, by < &E S3, 
1+ te < tg € So, ts < tg € Ss, te < ts € Sa, 


1+ ts < tg€ So, tg < t7 < Sz, tz < tgc Sa, lg < tg € Ss, 


From this scheme it is apparent that T is an increasing sequence of 
integers, infinitely many of which are contained in S,, and therefore 
not contained in R,, for arbitrary k > 2. Hence T is certainly not 
equivalent to any of Re, Rz,.... Since each element tg, tg, tio, . . . of 
T which lies in Sz exceeds its predecessor in 7’ by more than one, 
T is also not equivalent to R;. Hence T is not equivalent to any R,, 
contrary to the hypothesis that the sequence {R;} contains an element 
equivalent to any increasing sequence of positive integers. 


PROBLEMS 


1. Are the numbers V5 and (_l + V5)/2 equivalent? What about 
V/3 and (1 + V/3)/2? 

2. Show that if £ is irrational, then at least one of any two consecutive 
convergents to & satisfies the inequality 


p 1 
acre <—- 
: 4 2q° 


REFERENCES 193 


REFERENCES 


Section 9-6 


Markov’s work appears in Mathematische Annalen (Leipzig) 15, 381-407 
(1879) and 17, 379-400 (1880). A quite different treatment was given by 
J. W.S. Cassels, Annals of Mathematics 50, 676-685 (1949). 


SUPPLEMENTARY READING 


Chapters 1-5 


Davenport, H., The Higher Arithmetic, London: Hutchinson & Co. 
(Publishers), Ltd., 1952. 

Dickson, L. E., History of the Theory of Numbers, Washington: Carnegie 
Institution of Washington, 1919. Reprinted, Chelsea Publishing Com- 
pany, New York, 1950. 

GrirFin, H., Elementary Theory of Numbers, New York: McGraw-Hill 
Book Company, Inc., 1954. 

Harpy, G. H., anp E. M. Wricuat, An Introduction to the Theory of Num- 
bers, 3rd edition, New York: Oxford University Press, 1954. 

Jones, B. W., The Theory of Numbers, New York: Rinehart & Company, 
Inc., 1955. 

NAGELL, T., Introduction to Number Theory, New York: John Wiley & Sons, 
Inc., 1951. 

Orz, @., Number Theory and Its History, New York: McGraw-Hill Book 
Company, Inc., 1948. 

Stewart, B. M., Theory of Numbers, New York: The Macmillan Company, 
1952. 

Vinocrapov, I. M., Elements of Number Theory, translation of 5th Russian 
edition, New York: Dover Publications, 1954. 

WriGcut, H. N., First Course in Theory of Numbers, John Wiley and Sons, 
Inc., New York, 1939. 


Chapter 6 


Harpy and WRIGBT, op. cit. 

LANDAU, E., Handbuch der Lehre von der Verteilung der Primzahlen, vol. 1, 
Leipzig: Teubner Verlagsgeselischaft, 1909. Reprinted, Chelsea 
Publishing Company, New York, 1953. 

Lanpbav, E., Vorlesungen tiber Zahlentheorie, vol. 1, part 1, Leipzig: S. 
Hirzel Verlag, 1927. Reprinted as Elementare Zahlentheorie, Chelsea 
Publishing Company, New York, 1950. 


Chapter 7 
Harpy AND WRIGHT, op. cit. 
Lanpau, E., Vorlesungen tiber Zahlentheorie. 
Chapter 8 


CaHENn, E., Théorie des Nombres, Paris: Hermann & Cie., 1914-1924. 
LanbDav, E., Vorlesungen tiber Zahlentheorve. 
NAGELL, T., op. cit. 


194 


SUPPLEMENTARY READING 195 


Chapter 9 


HarDY AND WRIGBT, op. cit. 

Koxsma, J. F., Diophantische Approximationen, Berlin: Springer-Verlag 
OHG, 1936. (Ergebnisse der Mathematik, vol. 4, no. 4.) Reprinted, 
Chelsea Publishing Company, New York, 1951. 

Perron, O., Die Lehre von den Kettenbriichen, 3rd edition, Stuttgart: 
Teubner Verlagsgesellschaft, 1954. Second edition reprinted, Chelsea 
Publishing Company, New York, 1950. 

ZUELLIG, J., Geometrische Deutung Unendlicher Kettenbrtiche, Zurich: Orell 
Fissli Verlag, 1928. 


LIST OF SYMBOLS 


a(x), number of primes not exceeding x, 3 

|, +, divides, does not divide, 14 

GcD, greatest common divisor, 14 

(a, 6), Gcp of a and b, 14 

LcM, least common multiple, 23 

(a, b), ucM of a and b, 23 

= (mod m), 24 

y(m), Euler’s function, 28 

ord, a, order of a (mod m), 43 

(a/p), Legendre symbol, 45 

||, exactly divides, 52 

A(m), 53 

ind, a, 56 

+, Euler’s constant, 75 

(a/b), Jacobi symbol, 77 

a(n), sum of divisors of n, 81 

7(n), number of divisors of n, 81 

p(n), Mobius function, 86 

[x], greatest integer not exceeding x, 89 

O, 0, 92 

~, 92 

¢(s), Riemann’s function, 119 

P 2 (n), 128 

R{z], Gaussian integers, 129 

N, norm of a Gaussian integer, 129 

e) (n), 132 

R{-/d], quadratic integers, 138 

M(&), Markov’s constant, 149 

F,, Farey sequence, 154 

Rn (p,q), 161 

P;,/Q;, best approximations, 163 

Pr /Qx, convergents, 170 

&, equivalent real numbers, 187 
196 


INDEX 


Additive number theory, 3 
Algorithm, 15 
Associates, 129 


Base, 12 
Bertrand’s conjecture, 108 
Best approximation, 162 


Chinese Remainder Theorem, 35 
Common divisor, 14 
Complete quotients, 175 
Composite number, 18 
Congruence, modulo m, 24 
identical, 39 
linear, 31 
Convergents, 170 
Countable set, 189 
Cross-classification, principle of, 84 


Diophantine approximations, 4, 159 
Diophantine equations, 3 

linear, 20 
Dirichlet’s divisor problem, 118 
Dirichlet’s theorem, 76 
Division modulo m, 40 


Equivalence classes, 25 
Equivalence relation, 24 
Equivalent irrational numbers, 184 
Euclidean algorithm, 14 

Kuler’s constant, 95 

Euler’s criterion, 46 

Euler’s theorem, 42 

Euler’s y-function, 28 


Farey sequence, 154 
Fermat’s theorem, 42 
Fibonacci sequence, 7 


Fundamental solution, of Pell’s 


equation, 142 


Gaussian integers, 129 
Gauss’s lemma, 67 
Greatest common divisor, 14 


Hurwitz’ theorem, 153 
Indefinite quadratic form, 149 
Index, 56 

Infinitude.of primes, 6, 9 
Integer of R[~/d], 138 

Jacobi symbol, 77 


Lagrange’s theorem, 42 
Lattice points, 119 


Least common multiple, 23 


Legendre symbol, 45, 66 


Mediant, 155 

Mersenne primes, 83 

Mobius function; 86 

Mobius inversion formula, 87 
Multiplicative function, 28 
Multiplicative number theory, 1 


Norm, 129, 138 


nth power residue, 58 


197 


Number of divisors of an integer, 2 
Number-theoretic function, 81 


Order of a (mod m), 43 
Partial quotients, 175 


Pell’s equation, 137, 181 
Perfect numbers, 82 


198 INDEX 


Prime, 2 Representable, 126 
Gaussian, 129 Representation, proper, 126 
of R[-V/d], 138 Residue system, complete, 27 
twin, 69 reduced, 27 

Prime Number Theorem, 3 Residue classes, 27 

Primitive root, 49 Riemann ¢-function, 119 


Primitive \-root, 55 
Sieve of Eratosthenes, 97 
Quadratic field, 138 


Quadratic reciprocity law, 69 Unique Factorization Theorem, 18 

Quadratic residue, 45 for Gaussian integers, 131 
Units, 129, 1388 

Radix, 9 Universal exponent, 53 


Regular continued fraction, 170 
Relatively prime, 16 Wilson’s theorem, 44 


Errata in Topics in Number Theory, Vol. I 


page line replace with 

11 7 1-1, 1-1, there are c, and r such that 
14 —5 0<ri<a 0<m <b 

15 9 (b) (c) 

16 12 m0 m>0 

16 13 add at end ifm >0O 

18 6 insert at beginning positive 

18 —10 of primes of positive primes 

21 —1 add at end and Zo, yo is a particular solution. 
22 —5 the integers the nonnegative integers 

67 —4 (mod m), (mod m) and prime to m, 

37 17 some 8B <a some 8 withO<B<a 

40 —15 (a — a)| f(z) f(a) =0 

43 —13 3-16 3-17 

43 —5 always form>1 

45 ~—14 omit ( 

85 4 4-2 4-1 

55 —15 r>1 r>0 

58 —14 < 29(m) = y(m) 


199 


200 
60 


61 


62 


69 


ERRATA, VOL. I 


insert at end 


w=, 2, ,..., m1 


equation (12) is 
solvable if 
property. 


add at end 


—2/q 

4q' +a’ 

1 and odd, put 
number 

of integers 


equal or distinct 


insert at beginning 


multiples 
greater 
all 

[log z] 
sequence 


6-14 


(Here, contrary to our convention, 
the letter e stands for the 
noninteger >, 1/n! = 2.718....) 

| 1, 2, yeeey My 

the congruence (12) is solvable 
with p{ xyz if 


property, except = itself. 


(Postpone this problem to Sec. 5-4.) 


1, odd, and prime to a, put 
number, relatively prime to the first. 
of positive integers 

(not necessarily distinct) 

exists and 

positive multiples 

smaller 

all sufficiently large z 

[log x] +1 

sequence of real numbers 


6-16 


109 


111 


113 


115 


115 


119 


121 


121 


124 


128 


129 


132 


134 


138 


141 


149 


149 


ERRATA, VOL. I 201 


Ca 
Pr” 


insert at end 


C2 = e24 


6-17 


ard 


Pr 


ayy 
Bk k= n 2 ~ 2 


omit remainder of proof 


n= 1 (3 occurrences) 
given; 


representations 
(2 occurrences) 


for x 

If the leading 
some a 

y’ lal - \(€ - €’)/2| 


replace the argument 


of min by 


O(log z) 
n=0 


given by Erdos; 


proper representations 

units, and is not itself a unit, 
0<v<3 

for Zo 

If 6 = 0 or the leading 


some rational number a 


y’ lal - 3(E — €’)/2| 


( 2 C+ olel-e- £1) 
le 2 


202 


150 


150 


151 


151 


164 


167 


174 


175 


175 


177 


177 


180 


190 


190 


ERRATA, VOL. I 


replace displayed 


equations by 


the quantities in (16) 


(18). 

dy? 

k=0,1, ... 
Qe-12k + Pr—2 
(—1)* 


V7-1 
3 


(2 occurrences) 


1+ 


oe 
V7-1 


-1+ v3 
2 


the whole line 


E 1n+k 
one set 


every subset 


The product of the two 
quantities in (16) is constant, 


them 

(18) in which alye. 
Vdy’ 

b= =), 0, 15 ass 
Qk—-12k + Qe-2 


(-1) 


= 3- V3. 


1 

3+ Jerr 
veer 

Eny +k 

one infinite set 


every infinite subset 


TOPICS IN 
NUMBER THEORY 


VOLUME II 


PREFACE 


This book is a treatment of some advanced topics in the theory of 
numbers. It was written to follow the author’s ‘“‘Topics in Number 
Theory, Volume I,”’ in which elementary number theory is presented. 

The level of mathematical maturity required for Volume IT is 
much higher than for Volume I. Moreover, results obtained in 
Volume I are used freely, and in several of the chapters a knowledge of 
specific topics in various other branches of mathematics is assumed. 
In particular, knowledge of the theory of symmetric polynomials, as 
well as the rule for multiplying determinants, is needed for the 
algebraic theory in Chapter 3, and the theory of analytic functions is 
used both in the theorem of Schneider in Chapter 5 and in the in- 
vestigation of the distribution of primes in Chapter 7. There seemed 
to be no point in assuming background unnecessarily, however, so I 
have included brief discussions of groups and matrices, on a very 
elementary level, in Chapter 1. 

The treatment of quadratic forms, admittedly shallow, has been 
based on the properties of the modular group for two reasons. In the 
first place, the geometric interpretation makes the usual definition of 
reduced forms seem quite natural, while no real insight is afforded by 
merely listing an unmotivated set of inequalities. In the second 
place, this treatment provides a simple illustration of the power of the 
theory associated with elliptic functions, which is of considerable 
importance in modern number theory. Such methods are not often 
taught in American universities, and I hope that this treatment may 
serve to stimulate interest in them. 

To the best of my knowledge, the algebraic form of the Thue- 
Siegel-Roth theorem given in Chapter 4 has not previously appeared 
in print. 

W. J. L. 
Ann Arbor, Michigan 
November 1955 


li 


CONTENTS 


CHAPTER 1 BINaRyY QUADRATIC FoRMS 


1-1 Introduction 

1-2 Groups. , 

1-3 The modular group 

1-4 Reduced definite forms 

1-5 Reduction of definite forms . 
1-6 Representations by definite forms . 
1-7 Indefinite forms 

1-8 The automorphs of indefinite forms. 
1-9 Reduction of indefinite forms 

1-10 Representations 


CHAPTER 2 ALGEBRAIC NUMBERS 


2-1 Introduction 

2-2 Polynomials and algebraic numbers. 
2-3 Algebraic integers . , 

2-4 Units and primes in R [3] 

2-5 Ideals : 

2-6 The arithmetic of ideals ue te 

2-7 Congruences. The norm of an ideal 
2-8 Prime ideals. 
2-9 Units of algebraic number fields. 


CHAPTER 3 APPLICATIONS TO RaTIONAL NuMBER THEORY 


3-1 Introduction 

3-2 Equivalence and class ‘number 
3-3 The cyclotomic field K, 

3-4 Fermat’s equation. 

3-5 Kummer’s theorem : 

3-6 The equation zx? + 2 = y? 

3-7 Pure cubic fields 

3-8 Twolemmas . : 
3-9 The Delaunay-Nagell theorem : 


CHAPTER 4 THE THUE-SIEGEL-RoTH THEOREM . 


4-1 Introduction 

4-2 Polynomials ‘ : 

4-3 Generalized Wronskians . 

4-4 The index . 

4-5 A combinatorial Jemma 

4-6 The approximation polynomial . 


Vv 


vi CONTENTS 


4-7 The Thue-Siegel-Roth theorem . 
4-8 Applications to Diophantine equations 
4-9 A special equation. 


CHAPTER 5 IRRATIONALITY AND TRANSCENDENCE 
5-1 Irrational numbers ; 
5-2 The existence of transcendental numbers 
5-3 A criterion for transcendence . 
5-4 Measure of transcendence. Mahler’s classification 
5-5 Arithmetic properties of the exponential function. 
5-6 A theorem of Schneider .. 
5-7 The Hilbert-Gelfond-Schneider theorem 


CHAPTER 6 DIRICHLET’S THEOREM . 
6-1 Introduction 
6-2 Characters . 
6-3 The L-functions 
6-4 Nonelementary proof of Dirichlet’s theorem 
6-5 Elementary proof of Dirichlet’s theorem 
6-6 Proof that L(1, x) ~ 0 


CHAPTER 7 THE PRIME NUMBER THEOREM . 
7-1 Introduction 
7-2 Preliminary results ; 
7-3 The Prime Number Theorem : 
7-4 Extension to primes in an arithmetic progression. 


7-5 The integers representable as a sum of two squares . 


SUPPLEMENTARY READING . 
List OF SYMBOLS. 
INDEX 


ERRATA . . 


148 
152 
154 


161 


161 
165 
167 
170 
174 
186 
198 


201 
201 
207 
214 
215 
218 
221 


229 


229 
232 
240 
252 
257 


264 
267 
269 
271 


CHAPTER 1 


BINARY QUADRATIC FORMS 


1-1 Introduction. One of the subjects treated in elementary 
number theory is the possibility of representing a positive integer as 
a sum of two squares.* The expression x” + y* which is of interest 
for this problem is a special case of the general binary quadratic form 


f(z, y) = ax? + bay + cy’. (1) 


(This in turn is a special case of the n-ary m-ic form, which is a 
homogeneous polynomial of degree m in n variables.) Systematic 
research in quadratic forms was begun by Gauss, and has since been 
extensively pursued. We shall not go very deeply into the subject, 
but prefer instead to develop general methods whose usefulness is not 
limited to the theory of quadratic forms, nor even to the theory of 
numbers. 

Suppose that in (1) we make the linear homogeneous substitution 


z= ax’ + By’, (2) 

y= yx’ + dy’, 
where a, 6, y, and 6 are integers and D = ad — By ¥ 0. Solving for 
z’ and y’ gives 


75,8 
pO D” 
(3) 
gerne | = 


so it is only in case D = +1 that to each integer pair x, y corresponds 
an integer pair xz’, y’ and conversely. We shall eventually suppose 


*See, for example, LeVeque, Topics in Number Theory, vol. I, (Reading, 
Mass.: Addison-Wesley Publishing Company, Inc., 1956), Chapter 7. So 
much use will be made of the results obtained in this book that it will be 
referred to henceforth simply as Volume I. 


1 


2 BINARY QUADRATIC FORMS [cHaP. 1 


that D = +1, for reasons that will appear later. Then 


/ 


x = 6x — By, 


(4) 
y = —yx + ay. 


Substituting (2) into (1), we have a new form in x’ and y’, 
g(a’, y’) = Ax’ + Ba'y! + Cy”, 

where 
A = ao’ + bay + cy’, 


B = 2a0B + b(ad + By) + 2cyé, (5) 
C = af? + bB5 + cd. 


{f for suitable integral values of z and y we have f(z, y) = n, then, for 
the corresponding values of x’ and y’ determined by equation (4), 
g(x’, y’) =n. 

It thus appears that, as far as questions of representation are con- 
cerned, it would be senseless duplication to consider f(z, y) and 
g(x’, y’) separately; every integer represented by f is also repre- 
sented by g, and conversely. This leads us to call f and g equivalent, 
and to write f ~ g, if one can be obtained from the other by a uni- 
modular linear substitution with integral coefficients, 


2 = ax’ + By’, 

y yx’ 2 by’, 
This in turn brings up one of the principal problems of this chapter: 
how to decide whether two given forms are equivalent. 


The substitution (2) is described quite adequately by specifying 
the coefficients a, 6, y, and 6; that is, by writing the matrix 


eo 


This symbol does not represent a number, of course; it is simply a 
list of the coefficients of the substitution, in the order in which they 
occur in (2). However, we can give names to these matrices, and 
deduce certain of their properties from the corresponding properties 
of the associated substitutions. Thus if 


/ } 
u=(° 4 and aM’ = (% iN) 


ad — By = 1. (6) 


1-1] INTRODUCTION 3 


then we shall say that M and M’ are equal if and only if they cor- 
respond to the same substitution, that is, 


a=a’', B = p’, y= 7, 5 = 6’ 


If for arbitrary M and M’ we apply the corresponding substitutions 
successively, so that 


at’ + By’, 2! =a!’ + Bly”, 
yore toy, oy = ye + By", 

we could accomplish the same thing by the single substitution 
z= (aa! + By’)a”’ + (aB’ + Bd’)y”’, 
y = (ya! + dy’)x"” + (YB! + 88’)y””. 


Thus, if by the product MM’ of two matrices we mean the matrix of 
this latter substitution, we must define 


a B\(a’ B\ _ faa’ + By’ af’ + Bi\ 

y 8}\y 8] \yel + by" vB" + 88" 
Thus the product has as element in the zth row and jth column, for 
each 7 and j, the sum of the products of the elements of the ith row of 


the first matrix with the corresponding elements of the jth column of 
the second matrix. Moreover, if the determinant of a matrix is defined 


a8 
a B\ _ 
det fs ,) = 


it requires only a routine calculation to show that 
det M - det M’ = det (MM’). 
It is to be noticed that, in general, MM’ ~ M’M, although 
M(M'M”’’) = (MM’)M”. 


Since the substitutions given by (2) and (8) are inverse to each 
other, it is natural to call the matrix of (3) the inverse of the matrix M 
of (2), and to designate it by M~'. Then MM! = MM =I, 


where 
1 0 
- ¢ t): (7) 


I is called the zdentity matrix; it corresponds to the trivial substitu- 


M 


a B 
y 6 


= ad — By, 


4 BINARY QUADRATIC FORMS [cHap. 1 


tion s = x’, y = y’, and has the property that MI = IM = M for 
every MM. A square matrix has an inverse if and only if its determinant 
is different from zero. 

Finally, we designate by M the transpose of M, obtained by inter- 
changing rows and columns in M: 


; _fa B 7 (2 Y)\. 
tm =(% ay then I = (4 ) 


The transpose of a product is the product of the transposes, in reverse 
order: 


(MM’) = M'M. 
Also, the transpose of the inverse is equal to the inverse of the 
transpose: 
(M)7 = M-. 
Matrices need not be square. Thus 


ifX = (ry), then X = (F 


note, however, that nonsquare matrices have neither determinants 
nor inverses. 
The importance of this algebra of matrices to our present purpose 


lies in the fact that if 
r= (4 4 
~ Ady ec)’ 
then 


1 
> a 3b Hv = 1 1 H 
AFX = (zy) (5 ia) 9) = (ax + gby  gbz + cy) & 


= (ax” + bry + cy’). 
Although it is a slight abuse of language, it 1s convenient and in the 


present context harmless to identify a one-by-one matrix with the 
element itself, so we write 


f(a, y) = XFX. 


F is called the matrix of the form, and A = 4- det F = 4ac — b? is 
called the discriminant of the form. 

In terms of matrices, the substitution equations (2) and (3) can 
be written as 


1-1] INTRODUCTION 5) 
X=X'’M and X’=XM", 
respectively. Thus 
f(a, y) = XFX = (X'M1)F(X'M) = (X'M)F(MX’) 
= X’(MFM)X’, 


so that the matrix of g is G = MFM. (The reader might test his 
ability to manipulate matrices by showing that the last equation is in 
agreement with equations (5)). Multiplying both sides of the equa- 
tion G = MFM by M™ on the left and M~ on the right, we have 


MGM" = M—(MFM)M— = (M"°M)F(MM~?) = F 
If det M = 1, then also det M = 1, and 
det G = det (MFM) = det M- det F - det M = det F, 


so that the discriminant of a form is not changed by a unimodular 
substitution. 

In summary, a form with matrix F is equivalent to a form with 
matrix G if and only if there is a matrix M such that G = MFM and 
det M = +1; equivalent forms have the same discriminant and 
represent the same integers. 

The relation of ‘“‘equivalence,”’ as used here, is an equivalence rela- 
tion in the technical sense.* For it is clear that 

(a) f~f: F=IFI; 

(b) f ~ g implies g ~ fs G = MFM implies F = MGM"; 

(c) f~gandg ~ himpliesf~h: G= MFM and H = MGM’ 

implies H = M’MFMM’ = (MM’)F(MM’). 

Thus all the forms equivalent to a given one are equivalent to each 
other, and the set of all forms splits up into equivalence classes, any 
two elements in one class being equivalent, and elements from differ- 
ent classes being inequivalent. (The equivalence classes for the rela- 
tion of congruence (mod m) are simply the residue classes modulo m. ) 
Just as we chose a system of representatives of the various residue 
classes modulo m, we would like to pick a system of representative 
forms, one from each class. It is the object of the next two sections to 
develop machinery by which such reduced forms can be obtained in a 
natural way. 


*See, for example, Volume I, Section 3-1. 


6 BINARY QUADRATIC FORMS [cHaP. 1 


PROBLEMS 


1. Give proofs of the following statements, for the case of two-by-two 
matrices: 


(a) for some M and M’, MM’ M’M, 
(b) M(M’M"’) = (MM’)M”, 
(c) MM1= MM = I, IM = MI = M, 
(4) MM’ =M’M, (M)-? = M-. 
2. Verify directly, and also by matrix multiplication, that under the 
substitution 


x = 22’ + 3y’, y= 2' + 2y’, 


the form F(z, y)=3z2?—7zy+4y? goes into G(r’, y’) =22'?+32'y’+y". 
Compute the inverse of the matrix of the substitution, and so carry G back 
into F. 


1-2 Groups. We say that a set G of elements a, b,..., which 
need not be numbers, forms a group with respect to a certain opera- 
tion (designated for the moment by the symbol “‘o’’), which combines 
two elements to form a third, if 
(a) for every a and binG, aobisinG, 
(b) the operation is associative, so that ao (boc) = (acb) o¢, for 
every a, b, and c in G, 

(c) there is an identity element e, such that aoe = eca = a for 
every @ in G, 

(d) every a in G has an inverse a" in G, such that 


1 1 


aca =a 0@=—e€@. 


Perhaps the simplest example of a group is the set of all integers, 
under the operation addition. In this case, the number 0 is the iden- 
tity, since a + 0 = 0+ a =a, and the inverse of a is —a, since 
a+ (—a) = (-—-a)+a=0. This group is infinite (ie. has 
infinitely many elements), but the group consisting of the four 
numbers 1,7, —1, and —z, under the operation multiplication, is 
finite. If the operation consists of adding two integers and reducing 
the result to the least positive residue (mod m), then the numbers 
1,2,...,mform a group, with m as the identity. Instead of a com- 
plete residue system we could consider a reduced residue system 
(mod m); these numbers form a group M(m) in which the operation 


1-2] GROUPS 7 


is ordinary multiplication followed by reduction (mod m). This is 
not quite so obvious; the identity is clearly the number e=1 (mod ™m), 
but the existence of inverses depends on the fact that the congruence 
ax = 1 (mod m) is solvable if (a, m) = 1. Many of the results in the 
congruence theory which are obtained in beginning texts are simply 
special cases of general theorems about finite groups; it might be of 
interest to examine this relationship briefly before proceeding. 

A subset G of the elements of a group G is said to form a subgroup 
of G if it also forms a group with respect to the operation of G. Condi- 
tion (b) is automatically satisfied in this case, so that one need only 
verify that (a) holds (which we express by saying that G, is closed 
under the operation), that the identity is in G,, and that the inverse 
of each element of G, is in G,. 

The number of elements in a finite group is called the order of the 
group; we now show that the order of a subgroup G, divides the order 
of the group G. Suppose that a is an element of G which is not in G,, 
and let aG, be the set of all “products” a o g, where g runs through G}. 
Then no element a o g is in Gj, for if it were, the same would be true of 
acgog+=a. Also, if g, and go are distinct elements of G,, then 
a°g, #@°gpo, since otherwise @ !oa0g,; = a 100 Qp, or gi = Qe. 
Now suppose that b in G is not in either G, or aG,; then as before, no 
element of bG is in G, or aG,, and elements of 6G; arising from dif- 
ferent elements of G, are different. This process can be continued 
until every element of G is in precisely one of the sets G1, aG, bGi,..., 
and each set contains exactly m distinct elements, where m is the order 
of G,. Clearly, if there are ¢ such sets, then the order of G is mt, 
which is divisible by m. 

If a is any element of G, then the powers of a (that is, a, a” = aoa, 
a® =aoaoca,...) areallinG; we easily see that if G is finite, these 
powers form a subgroup, whose order m is such that a” = e. Hence, 
if the order of G is mt, then a” = e. For the group M(m) defined 
above, whose order is y(m), this statement reduces to Euler’s theorem, 
which states that a°™ =1(modm) if (a,m) =1. Many of the 
other results of Volume I follow immediately from these remarks 
about groups. For example, Theorem 3-19, which says that if a 
belongs to ¢ (mod m) then t|g(m), can be reworded in the language of 
groups to read, ‘‘The order of the cyclic subgroup generated by a 
(i.e., the group of powers of a) divides the order of the group.” A 


8 BINARY QUADRATIC FORMS [cHaP. 1 


primitive root of g is a generator of M(q); thus Theorem 4-11 is a 
statement of the fact that M(q) is cyclic (consists of the powers of 
a single element) if and only if q = 1, 2, 4, p*, 2p*. Ifa > 2, M(2*) 
has two generators, —1 and 5; for example, modulo 16, the powers of 
5 are 5, 9, 18, and 1, and these numbers, together with their negatives, 
form a reduced residue system. 

At present we shall do no more with finite groups, but turn our 
attention instead to the much more complicated multiplicative group 
of all two-by-two matrices with integral entries and unit determinants. 
This infinite group, which will be designated here by I, is called the 
modular group. To show that I is a group, we verify properties (a) 
through (d) above. The system is obviously closed under multiplica- 
tion, since the determinant of a product is the product of the determi- 
nants of the factors. The associative property has already been 
verified. The identity element of I is J, as defined in (7). The in- 
verse of any element 

(7 ') 
y 46 


is 


= (= (8 PAC 2 PYG Bar 


The group I differs from the other examples mentioned in that it is 
noncommutative, since in general MM’ #~ M’M. (Abstractly, G is 


said to be a commutative or abelian group if ac b = boa for every a 
and 6 in G.) 


1-3 The modular group. The properties of I could all be de- 
veloped by the use of algebra alone; we prefer instead to build up the 
theory with the help of a simple geometric interpretation. It is now 
convenient to reverse the roles of the accented and unaccented 
variables in the equations (2); this new notation will be used through- 
out the discussion of the modular group, but the original system will 
be reverted to when quadratic forms are again considered. To keep 
matters straight, (2) will be termed a subséztutzon, while the modified 
equations will be called a transformation. Putting z = 2/y and 


1-3] THE MODULAR GROUP 9 

2’ = x’ /y’, we get 

, _ 02 +B. 
yz+6 


z (8) 
So far nothing essential has been accomplished. The crucial point 
les in allowing z to range over all complex numbers, rather than the 
real rationals to which it was formerly restricted. Then equation (8) 
can be regarded as defining a transformation or mapping of the com- 
plex z-plane into the z’-plane. Somewhat more than this can be said: 


if 
ai( oP 
w= (* §) 


is in I, so that det M = 1, a simple calculation shows that the imagi- 
nary parts of z and z’ have the same sign. In other words, (8) maps 
the upper half of the z-plane (i.e., the region where the imaginary part 
of z is positive) into the upper half of the z’-plane, and the lower half 
into the lower half. Hereafter, we restrict attention to the upper half 
planes. 

It is convenient to identify the z- and z’-planes, and to think of (8) 
as sending each point z of the upper half U of the complex plane into 
another point z’ of U. We also identify the elements of I with the 
corresponding transformations (8), which has the effect of identifying 


the matrices 
a £p —a - 8 
i) a 


In accordance with the earlier definition of equivalence, two points 
zand z’ of U will be called equivalent if one can be mapped into the 
other by a transformation of I. As usual, this assigns each point of U 
to an equivalence class; two elements of the same class are equivalent, 
and elements from different classes are inequivalent. A region R of U 
is called a fundamental region if no two of its points are equivalent, 
while every point of U is equivalent to a point of R; in other words, 
Ff constitutes a complete system of representatives of the above equiv- 
alence classes. It would be more precise to refer to R as a funda- 
mental region of the group IT, since two points may be equivalent with 
respect to one group of transformations but not with respect to 
another. For example, it is clear that a fundamental region R’ of a 
subgroup I’ of I contains a fundamental region of I itself, if both 


10 BINARY QUADRATIC FORMS [cHaP. 1 


regions exist. For any point in U, being equivalent to some point of 
R’ under the transformations of I’’, is a fortiori equivalent to the same 
point of R’ under the transformations of the larger group I. It may 
not be true, however, that any two points of R’ are inequivalent with 
respect to I. 


THEOREM 1-1. The region Rin U composed of all points z such that 
—% < Rez < § and either |z| > 1, or else |zZ| = 1 and —3< 
Rez < 0, is a fundamenial region of T. (See Fig. 1-1.) 


Proof: First note that IT has the subgroup Ip of all integral 
translations 2’ = z+ 6. For the 
associated matrix 


(0 {) 


has determinant 1, the identity 
transformation 2’ = z is in To, 
the inverse of 2’ = z+ Bis2’ = 
z — 6 and isin Io, and the result 
of making two translations is 
again a translation. Ig is cyclic, 
being generated by 


z=2t+. (9) 


As a fundamental region of Ig we could choose any infinite strip in U 
of unit width, extending parallel to the imaginary axis from the real 
axis. We take the following one: 


Ro: Imz>0, —3< Rez < #. 


FiguRE 1-1 


From the remark preceding the theorem, Ro must contain a funda- 
mental region of I’ if any exists. No is not itself a fundamental region 
of I’, however, for the point 7/2 of Ro is transformed into the point 27 
of Ro by the transformation 


(10) 


With each transformation 


1-3] THE MODULAR GROUP 11 


with y ~ 0, there is associated the circle C(T): |yz + 6| = 1, with 
center at —6/y and radius 1/|y| < 1. Now 


at ce J 
: Lora ar ae ve +68 


so that C(T) is transformed by T into |yz — a| = 1, which, by (3), is 
C(T—'). More importantly, the exterior of C(T) goes into the 
interior of C(T~'). Itis simple to deduce from this that no two points 
of the region R described in the theorem are equivalent. Certainly no 
point of R is mapped into another by an element of To. But if T is 
not in T9, then y ¥ 0, and since the interior of R is external to all the 
circles C(T) (inasmuch as they all have radii < 1 and are centered at 
real points), any interior point of R is mapped by T into an interior 
point of one of these circles, and hence into a point outside R. 

The are A: |z| = 1, —3 < Rez <0, which forms part of the 
boundary of R, is also completely exterior to all the circles C(T) ex- 
cept |z| = 1 and |z+1| =1. The circle |z| = 1 is associated with 
transformations 


, caatB 
z= ’ 
2 
and since the determinant must be 1, 8 = —1 and 
—]l ] 1 
fee eye ee Se 
Z 2 {2| 


If zis a point of A, |z’ — a| = 1, and soz’ is notin R unless a = 0 or 
—1l. If a= 0 we have the transformation (10), which sends A onto 
the arc |z| = 1,0 < Rez < §; this arc has only the point 7 in common 
with R, and 7 goes into itself. (This means that 7 is equivalent to 
itself in two different ways: 2’ = z and 2’ = ee ) Ifea=-—-I1,A 
goes into the arc |z + 1| = 1, -1 < Rez < —%; these two arcs have 
just p = —2 + iV3/2 in common, and p goes ‘ate itself. 
The circle |z + 1| = 1 is associated with transformations 


, a +8 _ ot (a ~ 1) 1 
| ae a 


12 BINARY QUADRATIC FORMS [cHap. 1 
If |z| = 1, then 
/ 
Z— —] 
| (a — 1) | i 
Z2—a 
je" — (a — 1)| = |2’ — ad, 
Rez’ =a — 3, 
and z’ is not in R unless a = 0. Under the transformation 
a 
z+1 

the arc A goes into the line segment Rez = —3, 3 < Imz < V3/2; 
the arc and the segment have just p in common, and p goes into itself. 


We have thus shown that no two points of R are equivalent, and have 
incidentally obtained the following result, which will be useful later. 


THEOREM 1-2. The point p = (—1+iv 3) /2 ts mapped into 
itself by the three transformations 
1 1 
2’ =z, ae or and f= it 
and by no others. The point 1 is mapped into téself by the two trans- 
formations 


2=2 and Z2=-—--») 


and by no others. Any point of R different from p and 2 1s mapped 
into itself only by the identity transformation z’ = z. 


To complete the proof of Theorem 1-1, we must show that any 
point z in U is the image of a point in R under a transformation of I. 
We do this by finding a finite sequence of transformations such that if 
they are successively applied to z, the final point 2’ isin R. Then the 
inverse of the product of these transformations maps z’ back into z. 

Designate by S the generator (9) of 9, and by W the transforma- 
tion (10). Let z bea point of U notin R. Then for some integer nj, 
which may be positive, negative, or zero, 23 = Sz = z + n,1sin Ro, 
the fundamental domain of Tp. If 2; is in R, we are finished. If 
lz:] = 1 but 0 < Rez < 3, then Wz, is in R. If |z;| <1, then 
za = Wz, has modulus greater than 1. In fact, if 2) = 71 + 71, 


1-3] THE MODULAR GROUP 13 


—2x 1 

ty = SD = ae t ie —$<™m< 
so that Im z, > Imz, and if y, < 4, then Imz_. > 2 Imz, since 
then x, + y;” < $. Ifz. isin R, we are through. If not, there isa 
suitable exponent ne such that z3 = S”2z, isin Ro, and Im zg = Im 2p. 
If zs is not in R, we can apply W again, and get 24 = WS™WS”z. 
What we must show is that after finitely many steps, this process 
leads to a point in R. 

As long as yz < 3, we will have yz41 > 2yz, if 2441 = Wee. Start- 
ing with a positive number (the imaginary part of z), a finite number 
of doublings will produce a number larger than 3. So suppose that 
we have obtained a z, = x, + ty; such that 


—% <2 < 3; Yn > oe ty? + yp? <1. (11) 


bl 


9 


Then 
+ 
k+l = — ~~? Z+2 = —— Tn 
= 2k +2 Zk ; 
where n is so determined that —3 < rzp12 < 3. This gives 


NL, — 1 + iny; 


rte Le + yx 
so that 
a (nx, — 1)? + ny; 
k+2 1,2 a yi 
If |n| > 2, 
4.2 
Ze+0l = “Te 3 I, 


while if |n] = 1, the hypothetical inequality |z,,.|? < 1 gives 
(te — 2)? + yy? < xe? + yp’, 


which says that z;, is farther from the origin than from the point 
z =n, which is false from the first inequality of (11). Finally, if 
n = 0, then |zx42|? = 1/|z,|? > 1. Hence in all cases, |z,,0| > 1, and 
—5 < Rezo < 3. If 2u4 is still not in R (which may happen if 
l2n42| = 1) then Wz,42 is in R, and the proof is complete. 

Moreover, the proof has shown that S and W are generators of I, 
since every transformation of I can be written in the form 


S,"*W at ah W S_”2W S471, 


14 BINARY QUADRATIC FORMS [cHaP. 1 


FIGURE 1-2 


A geometric representation of the group I is given in Fig. 1-2. 
Here we have considered the region R as the image of itself under the 
identity transformation J, and have put R = R(I). The congruent 
unshaded region to the left of R is then R(S~'), in the sense that if a 
point 2’ of it is equivalent to a point z of R, thenz’ = S'z. To put it 
differently, S~! maps R onto this region, just as W maps R onto the 
unshaded region R(W) immediately below R. The semicircular arcs 
are portions of the circles C(T7'); infinitely many of them terminate in 
each rational point on the real axis. If the drawing and shading were 
completed, any shaded or unshaded region could be taken as a funda- 
mental region. Each fundamental region or “double triangle’’ is 
bounded by three arcs, with vertex angles of 0, 7/3, and 7/3. The 
heavy arc inside each region indicates the portion of the boundary 
which is to be included in the region. 


PROBLEMS 
1. Find the point in R to which the point 
38 +20 
8 + 62 
is equivalent, by the method used in the proof of Theorem 1-1. Do you 
see an easier way, for this particular number? 
2. If the term ‘‘circle”’ is used in the broad sense to include straight lines, 
show that the transformations of I’ send circles into circles. Under what 


circumstances are the image circles actually lines? What can be said about 
such a line if, for every point z on the original circle, Im z > 0? 


1-4] REDUCED DEFINITE FORMS 15 


3. Verify that, in the notation of the text, (SW)? = I. 
4. Show that the transformations 


P: 2? =1-—z, Q: pee 
2 


generate a group of six elements (the group of anharmonic ratios) of which 
a fundamental region is the set of points z such that 


Imz> 0, lz] > 1, and le-1|> 1, 


together with half the boundary of this region, leading from (1 + iv 3)/2 
to infinity in one direction. Sketch the analog of Fig. 1-2 for this group. 
[Note that the transformations of the group do not carry U into itself.] 


1-4 Reduced definite forms. With the help of the facts now 
known about the modular group, we can deal with the question of how 
to decide whether two given binary quadratic forms are equivalent. 
We must consider separately the essentially different cases in which 
the discriminant A = 4ac — b? of the form az? + bry + cy’ is 
positive or negative. (We put aside the degenerate case in which 
A=0.) If A > 0, the form is called definite, otherwise indefinite. The 
definite forms can be further classified as positive or negative, according 
as a>0Oora<0O. The reason for this terminology is that the 
polynomial az” + bz + ¢ associated with a definite form has nonreal 


zeros, so that the form 
2 
rb Grete 
y y 


has the same sign as a for every choice of x and y except x = y = 0, 
while an indefinite form can have values of either sign. We shall first 
consider definite forms, restricting our attention to positive forms, 
since the treatment of negative forms is almost identical. 

Since the matrix of a form is a little cumbersome, we shall use the 
symbol [a, b, c] to designate the form az? + bry + cy”. It is to be 
clearly understood that this is simply an abbreviation, and cannot be 
combined with like symbols as matrices can. 

Let us consider then a positive definite form f(x, y) = [a, 6, c], in 
which A> 0,a> 0, andc> 0. For the time being, we do not re- 
quire that a, b, and c be integers. Then the quadratic polynomial 


16 BINARY QUADRATIC FORMS [cuaP. 1 
f (2) = az” + bz +c has zeros 
—b+ V—A- 
2a : 


of these, we single out the one with positive imaginary part and call 
it w. Thus to the form [a, b, c] there corresponds the point w in the 
upper half plane. Conversely, each point in U corresponds to exactly 
one form of discriminant A. For if zo is such a point, and Zp is its 
complex conjugate, then there is a unique number x such that the 
quadratic expression x(z — 2))(@ — Zp) has discriminant A. Hence if 
we consider only forms of given discriminant A (which is all that is 
required in the equivalence problem, since equivalent forms have the 
same discriminant), there is a one-to-one correspondence between 
points of U and forms of that discriminant. Moreover, if the points 
w, and we are associated with the forms f; and f, of discriminant A, 
and if a transformation T of I carries f; into fo, then it carries w; into 
wo. It therefore makes no difference whether one speaks of the form f 
or the point w, as far as the operations of I are concerned. We call w 
the representative of f. 

It should now be clear how to decide whether or not two forms are 
equivalent. If they do not have the same discriminant, they are not 
equivalent. If they have, they are equivalent if and only if their 
representatives are equivalent, and this can be decided by trans- 
forming the representatives into the fundamental region R, where 
they must be identical to be equivalent. This leads us to define a 
reduced form as one whose representative is in R; reduced forms are 
equivalent of and only if they are identical, and each class of equivalent 
forms contains exactly one reduced form. 

Since 


(12) 


c 
—™ =) 
4a? a 


w is in R if and only if —3 < —b/2a < 4, and either c/a > 1, or 
c/a = 1 and —} < —b/a <0. Simplifying, we have that [a, b, cl 
is reduced if and only if either 


—a<b<ac<e or 0<b<a=c. (13) 


1-5] REDUCTION OF DEFINITE FORMS 17 


PROBLEM 


Prove the assertion, made in the text, that if w; and we are the repre- 
sentatives of the forms f; and fe with discriminant A, and if a T in I carries 
fi into fe, it carries w1 into we. 


1-5 Reduction of definite forms. A given form can be trans- 
formed into its equivalent reduced form by exactly the process used 
in the proof that R is a fundamental region of If. That is, by a trans- 
lation S”1, w can be changed into w’, where —} < Rew’ < 4; if w’ 
is not in R, we begin afresh with Ww’, etc. The translation z’ = 
z+, must be such that 


1 b 1 
ee ae —y; 
Qo t™ <9 
or 
b = 2an, + 11, 


where —a <7, <a. The transformation z’ = z + n, has matrix 


ny 1 Ny 
si = (5 2 


but we must now revert to the inverse transformation z = 2’ — n, to 
utilize the results of Section 1-1, which were based on the equations 


(2). If we put 
_({1 -—m)\_ 
M = ({ 1 ) = 5 , 


then, as we saw earlier, M carries a form with matrix F into one with 
matrix 


G = MFM, 


so that in this case, if we let the result of the first translation be 
fi(z, y) = XF,X, then 


_ i 0 l —-Ny . 
r=(5, 1)F0 1") 


Similarly, if FP. is the result of applying the inversion W to Fj, then 


_(0 -1 0 1). 
Fs = ({ Pe (ee ) 


A simple calculation shows that, if f; = [a, b, c], then fe = [c, —), al. 


18 BINARY QUADRATIC FORMS [cHap. 1 


Thus we have the following algorithm for reducing f = [a, , c]: find 
nm, and r; such that 


b = 2an, — 14, —a<r, <4, 


and compute f; = [a,, bi, ci], where 


_ 1 0 1 —N 
n= (I, F(o J ) 


so that f; = [a,b — 2any, n17a — bn, + cl. If fy is not reduced, put 
fe = [a, —b1, a1] = [ae, be, ce]. If fe is not reduced, repeat the entire 
procedure. For some k, f; will be reduced. 

The discussion thus far has been valid for positive definite forms 
with arbitrary real coefficients. For the remainder of this section and 
the next, we consider only integral forms, that is, those with integral 
coefficients. 


THEOREM 1-3. There are only finitely many classes of integral 
definite forms of given discriminant. 


Proof: To each class there belongs just one reduced form [a, ), c] 
satisfying the conditions (13). Since 


4a? < 4ac = A+B? < A+’, 


the inequality 0 < a < VA/3 holds for each reduced form, so that 
there are only finitely many possible values of a for fixed A. Since 
|b| < a, the same is true of b, and for each pair a, b there is at most 
one integer c such that 4ac — b? = A. 
If, for example, A = 3, then 0 < a < 1, so that a = 1 and hence 
= 0 or 1; from this it is easily seen that the only integral reduced 
form of discriminant 3 is x? + zy + y’. There is also just one class of 
discriminant 4, and its reduced form is x? + y’. 


PROBLEMS 


1. Find all reduced integral definite forms of discriminant A < 20. 
2. Find the reduced form equivalent to (117, 103, 100]. 


1-6 Representations by definite forms. If a transformation of [ 
leaves a quadratic form unchanged, it is called an automorph of the 
form. Since an automorph also leaves the representative of the form 
unchanged, and is the only kind of transformation which does, the 
following theorem is an easy consequence of Theorem 1-2. 


1-6] REPRESENTATIONS BY DEFINITE FORMS 19 


TuEoreM 1-4. The only automorphs of a(x” + y*) are 


—_ / = / 
\? = +7 Ana sates 


y= ay’, y = Fo’. 
The cnly automorphs of a(x? + xy + y”) are 
z= +2’, t= Fy’, and . = +7’ 4+y/, 
y= ay, lysate’ sty’, y = Fa’. 


Any positive reduced form distinct from these two has only the auto- 
morphs 


2 +2’, 
y=xy. 

An integer n is said to be properly representable by an integral form 
[a, b, c] of discriminant A if there are relatively prime integers a, 7 


such that aa? + bay + cy? = n. For such a, 7, there are Bo and do 
such that ad9 — Boy = 1, and, in fact, 


ad — By = 1, 
if, for some integer f, 
B= Bot at, 
6 = do + vi. 
If we make the substitution 
/ / 
x=ax + By, 
itd (14) 
y= yx + by, 


then [a, 6, c] goes into a form [n, m, l] with first coefficient n, by 
equations (5). Also by (5), 


m = 2aa(Bo + at) + b(ado + ayt + Boy + ayt) + 2cy (b> + vt) 
= 2aaBo + b(adp + Boy) + 2cydo + 2nt, 


so that m is determined modulo 2n. Choose m so that 0 < m < 2n; 
then ¢ is fixed, 6 and 6 are unique, and IJ is determined by the dis- 
criminant : 
4in — m® = A. 
THEOREM 1-5. Let a, y be a proper representation of n > 0 by the 


integral form [a,b,c] of discriminant A. Then there are unique 
entegers 8B and 6 such that ad — By = 1, and the substitution (14) 


20 BINARY QUADRATIC FORMS [cHAP. 1 


replaces [a, b, c] by the equivalent form [n, m, l], where 0 < m < 2n, 
m satisfies the congruence 


m? = —A (mod 4n), (15) 
and 
2 
i (16) 
4n 


Thus to each proper representation of n by [a, b, c] there corresponds 
a unique form which has first coefficient n and satisfies certain auxil- 
lary conditions. The appropriate converse, which we now consider, 
gives the number of such representations, and provides an effective 
method of finding them. If mis a solution of (15) and 0 < m < 2n, 
then 4n — mis also a root, and 2n < 4n — m < 4n. We shall refer 
to m as a minimum root if 0 < m < 2n. 


THEOREM 1-6. Let w(f) be the number of automorphs of f = [a, b, cl, 
an integral positive form of discriminant A. Let n be a positive integer. 
Corresponding to each minimum root m of the congruence (15), 
determine | by equation (16). Then the number of proper representa- 
tions of n by f is w(f) times the number of such forms [n, m, l| which 
are equivalent to f. In particular, af there is only one class of dis- 
criminant A, the number of proper representations 1s w(f) times the 
number of minimum roots of (15). 


Proof: Suppose that g = [n, m, l] is a form of the type described in 
the theorem. Then if f is not equivalent to g, Theorem 1-5 shows 
that there is no representation of n by f corresponding to the minimum 
‘root m. If f is equivalent to g, let T be the matrix of a substitution 


which replaces f by g, and let A be the matrix of an automorph of f. 
Then 


G=TFT and F= AFA, 
10] 


(AT)F(AT) = TAFAT = TFT =G, 


so that AT is also the matrix of a substitution which carries f into g. 
Conversely, if for any U, 


G = UFU, 
then UFU = TFT, and 
F = T?UFUT, 


1-6] REPRESENTATIONS BY DEFINITE FORMS 21 


so that UT is the matrix A of an automorph of f, and U = AT. 
Hence there are exactly w(f) substitutions which replace f by g. 


_(« B 
" ra(« 8), 


and f has only two automorphs (see Theorem 1-4), then 


_fa B =a: ==6 
AT = (° 4 or ee a4 


and a, y and —a, —y give two distinct proper representations, since 
(a, y) = land therefore a and y are not both zero. Iff ~ a(z? + y”), 
then 


_fa B —a -—8£ —y —6d 
AT = (° ,) or & 78) or ( ig B ) 
or es a ’ 


and the representations a, y; —a, —y; —y,a@; and y, —a are again 
distinct. If f ~ a(x? + zy + y’), then AT is one of the matrices 


a B it —$ aty B+é6 
x(° a Oe nor 7 eee are} 


and these also lead to distinct representations. 

If there is only one class of discriminant A, then f and g are neces- 
sarily equivalent, so that all minimum roots of (15) lead to repre- 
sentations. The proof is complete. 


In the case of primitive forms (those having relatively prime 
coefficients), w(f) depends only on A: w(f) = 6,4, or 2 according 
as A is 3, 4, or larger than 4. If f(z, y) = 27 + y’, so that A = 4, 
then m must be even to satisfy (15). Let m = 2m; then m,? = 
—1(modn), and 0 < m < 2n means 0 < m, <n, so that the 
number of proper representations of n as a sum of two squares is four 
times the number of solutions of the congruence u~ = —1 (mod 7). 
This result was obtained in Theorem 7—5, Volume I, by quite different 
methods. 


PROBLEMS 


1. Find 6, 6, m, l of Theorem 1-5 corresponding to the proper representa- 
tion 3, 5 of 118 by [2, —5, 7]. 


22 BINARY QUADRATIC FORMS [cHap. 1 


2. What is the number of proper representations of 28 by [], 1, 2]? 
Find them. 

3. Use Theorem 1-6 to discuss the proper representability of 10 by 
(2, 1, 2]. 

4. Show that every prime congruent to 1 or 3 (mod 8) has a unique 
proper representation in the form z? + 2y? with x >0, y>0. More 
generally, show that if n is the product of powers of r such. primes, then n 
has 2’*! proper representations in this form. 


1-7 Indefinite forms. The behavior of indefinite binary forms is 
remarkably different from that of forms with positive discriminant. 
For example, any integral indefinite form whose discriminant is not 
the negative of a square has infinitely many automorphs, and there- 
fore represents any integer in infinitely many ways if it represents it 
at all. Moreover, there seems to be no natural way to pick out a 
unique reduced form in each equivalence class, although we shall find 
a finite set of canonical forms in the case of integral forms. 

Hereafter we restrict attention to integral forms [a, b, c], and put 


D= —-A=0*? — 4ac> 0. 


If D is a square, then [a, b, c| factors into two linear factors with 
integral coefficients. We dismiss this degenerate case, and hereafter 
require that D be a nonsquare integer. Finally, for the sake of 
simplicity we consider only the case that [a, b, c] is primitive. We see 
from equations (5) (proof by contradiction) that any form [A, B, C] 
equivalent to a primitive form is again primitive. 

As before, there is associated with [a, b, c] the quadratic equation 


az* + bz+c=0, 
which this time has two real roots, say 


_-b+vD 
- 2a : 


—b—~WVJ/D 
2a 


@y 02 = 

It is easily verified that a transformation of the modular group which 

sends [a, b, c] into [a’, b’, c’] sends w; into w;’ and w2 into we’, and 

never 1 into w2’. We call w, the first root, and we the second root. 
As C. Hermite noticed, there is also associated with 


[a, b, c] = af — wy) (z — wey) 


1-7] INDEFINITE FORMS 23 


a family of definite forms 
a 2 at 
g(t, y) = oy (x — wy)" + > C= wey)”, 


where ¢ > 0 is areal parameter. A simple calculation shows that the 
discriminant of g;(z, y) is D, for every t > 0. Reverting to the 
quotient variable z = x/y, we find the zeros of ;(z) to be those 
points 2; such that 


=e — a1)? = —tler — 02)’, 


or 
24 — W = +t (2; — We). 


The transformation z’ = iz rotates the plane about the origin through 
the angle 7/2; it follows from the last equation that the line segment 
connecting 2; with w; is perpendicular to the segment connecting 2; 
with we, and hence that z; lies on the circle having as diameter the 
segment which connects w; and w,. If, as usual, we take that root z; 
which has positive imaginary part as the representative of y,, then we 
have associated with [a, b, c] the semicircle > in U connecting o, 
and wo. As ¢ varies from 0* to ©, z; describes 2 from w, to we; we 
can think of the semicircle as oriented with this sense, inasmuch as the 
orientation is preserved under transformations of Tr. This orientation 
is necessary, Since otherwise there would be no way of distinguishing 
the (usually inequivalent) forms |[a, b,c] and [—a, —b, —c]. The 
form is now completely described by specifying its oriented semicircle 
> and its discriminant — D. 

An indefinite form f will be called reduced if the associated semi- 
circle intersects the fundamental region R considered earlier. Thus 
f is reduced if and only if the definite form ¢; is reduced for some t. 
The fact that any indefinite form is equivalent to a reduced form is an 
immediate consequence of the fact that 9), for example, is equivalent 
to a reduced definite form: the transformation which carries ¢; into 
a reduced form also carries f into a reduced form. The difficulty lies 
in showing that each indefinite integral form is equivalent to only 
finitely many reduced forms. To do this, we must first examine an 
important subgroup of I which is intimately connected with f. 


24 BINARY QUADRATIC FORMS [coap. 1 


1-8 The automorphs of indefinite forms. A transformation of I 
which leaves [a, b, c] unchanged also leaves w; and we fixed. The fixed 
points of the transformation 


, +8 
yz +5 
are those points w such that 
aw + B 
Qo = 3 
yo + 6 
or 
yw" + (5 — aw — B = 0. (17) 


Suppose that the roots of this equation are w; and w.. These num- 
bers are also the roots of the equation aw? + bw +c =0; since 
(a, b, c) = 1, it follows that for some integer u, 


vy = au, (18) 
5— a = bu, 
—B = cu. (19) 
Putting 6 + a = t, we have 
t — bu t+ bu 
oS? 6 = ee (20) 
where ¢ and wu are such that 
css ice pees eh eee ee, 
4 4 
or 
? — Du? = 4. (21) 


Conversely, if ¢ and u are solutions of (21), and a, B, y, and 6 are 
determined by equations (18) through (20), then (17) reduces to 
u(aw* + bw +c) = 0 and ad — By = 1. This proves 


THEOREM 1-7. The set of all automorphs of the primitive indefinite 
form [a, b, c] zs given by the set of all matrices 


a 8B 

y 6 
with a, B, y, and 6 determined by equations (18), (19), and (20), 
where tand u run over the integral solutions of the Pell equation (21). 


1-8] THE AUTOMORPHS OF INDEFINITE FORMS 25 


Originally, automorphs were defined as substitutions giving z in 
terms of 2’, while we have here used the inverse transformation giving 
z’ in terms of z. But if F = AFA, then F = A~'FA™, so that the 
inverse of an automorph is also an automorph, and the set of all 
automorphs coincides with the set of all inverse automorphs. This 
fact has much greater significance than in its application above. For 
since the product of two automorphs is again an automorph, the 
automorphs of f form a subgroup of I, which we shall designate by 
T'4(f). (The elements of Ty (f) will be taken sometimes as transforma- 
tions and sometimes as their matrices. The ambiguity resulting from 
the fact that the matrices A and —A correspond to different substitu- 
tions in the form but to the same fractional transformation of IT 
should cause no difficulty if the reader remains aware of it.) Using 
well-known properties of the solutions of Pell’s equation,* I'4(f) 
can be characterized as follows. 


THEOREM 1-8. T4(f) ts the infinite cyclic group generated by the 
matrix 
y = (2(to— bu) em ). 
aUo 3 (to + buo)] ’ 


af A ts any automorph of f, then A = V” for some integer n, positive, 
negative or zero. Here to, Ug 1s the minimal positive solution of equa- 
tion (21). 


(The ambiguity mentioned above is exemplified here: every trans- 
formation 2’ = (az + B)/(yz + 5) of Tu(f) can be made to have 
matrix V”, but the set of all substitutions which leave f fixed is given 
by X = £X'V") : 


Proof: According to Theorem 1-7, I'4(f) is the group of matrices 


1 
2(¢ — bu) — cu 2_ 2 = 


so it is to be shown that each of these matrices is a power of V. 
If we put 
3 (to + UoWD)" = $(tn + UnVD) 


*Pell’s equation is discussed in Volume I, Chapter 8. The minimal 
positive solution is described in Theorem 8-7. 


26 BINARY QUADRATIC FORMS [cyap. 1 
for each n, then 
4 (test + Un+1 VD) = + (tn + nw D) (to + upV D) 


= ¥ (totn + Duptn) ze F (toUn - trug) VD, 
so that 


tnt = ¥(totn + Dugun), Unga = ¥(tnUio + ton). 


Now suppose that 
yrti — @ Ge bun) —CUn ): 


Aun 4 (tn + bun) 
an assumption which is correct for n = 0. Then 
4 (to — buo) —cu 
yrt2 cme Vy2ti = (? (to 0 0 ) 
auo 3 (to + buo) 
. 4 (tn — bun) —CUn 
aUn 5 (ta Fg bun) 
= G (tnt = buin+1) —$e (uotn + Unto) ) 
a (uotn + Unto) $ (trv + buin+1) 
= 4 (¢nt1 — bUn41) —CUn+1 ) 
GUn+1 2 (trai + bun41) 


and by induction, V” is of the supposed form for alln > 0. Similarly, 
it can be shown that 


vV7= (? (tn—1 — bu,_1) — CUn—1 ) ; 
QUn—1 5 (tn—1 a bun_1) 


so that V” is also of the supposed form for all n <0. Hence the 
matrix corresponding to any solution of equation (21) is a power of V, 
and the theorem is proved. 


As usual, it is useful to know a fundamental region of I'4(f). 


TuEorEM 1-9. Suppose that the perpendicular bisector Co of the 
segment | joining w; and wo ts mapped by V into the circle Cy. Then 
C, does not intersect Co, and the (infinite) region between them, 
together with Cy, w1, and we, is a fundamental region of Ta(f)- 


Proof: If the arbitrary transformation 


2’ = T(z) = (az + B)/(yz + 8) 


1-8] THE AUTOMORPHS OF INDEFINITE FORMS 27 


has the distinct fixed points z, and za, then by dividing z’ — 2, by 
2’ — 25 we get 
z — 2 az t+ B-atyets) (@-—ywa)et (8 — 82;) 
2 — 2 ot +B—2z(yet+5) (a — yea)e + (6B — b22) 
a-— ya 2+ (B—- ba)/(a— 721) 
a— yg 2+ (8 — bz)/(a — 22) 


In the case at hand, T is the transformation 
Ve = 3 (ty — REO — CU 
Augz + Z(to + buo) 
with fixed points w; and we, and 
a — yz, 43(tp — bu) — au(—b + VD)/2a tp - VD up 
a — 22 7 3 (tp — buo) — auo(—b — /D)/2a - to + VD up ) 
We put 


to = /D Uo 
a ee 
to + V/D Uo 
and have for V the representation 
/ 
z2—w Z2—-w 
——— = K-——. (22) 
2 — 92 z2— W9 


It follows that V” is the same transformation with K replaced by K”; 
this could be used to give a second proof of Theorem 1-8. 

By its definition, K is a real number between 0 and 1. Since the 
perpendicular bisector Co of the segment / joining w, and we, has the 
equation |z — w,| = |z — we|, V" transforms it into |z’ — o,| = 
Kz’ — we|, as we see by taking absolute values in (22). If we put 
z=2x-+ vy, the last equation becomes 


Ce (em +P = Keo) +7), 280 


and it is a matter of simple analytic geometry to prove the following 
assertions: for positive n, C, is a circle with its center on the real 
axis, on the extension through «w, of J; it contains w, in its interior; 


it lies entirely on that side of C 


Some of th 


NARY QUADRATIC FORMS 


08 
Q. 


ese circles are sho 
region R4(1), which is the region described 
h that 


set of points z suc 


(¢ 
Zz 


Ke <|P=" <x, 
2 — We 


~~ 


Yj 


YY 


7 


os 


Figure 1-3 


— 
& = 2 


UA 
MOA 
SS 
VS 


[cHap. 1 


I AP. 
9 on which w, lies; and its radius 
es zero as n Increases. For negative 
ide of Cop, contain we, and close down on we as 
hown in Fig. 1-3. The ligh 
in Theorem 


n, the circles C’, lie on 
|n| increases. 
tly shaded 
1-9, is the 


1-9] REDUCTION OF INDEFINITE FORMS 29 


which is the region between C, and Co, including C2. In general, 
V” transforms R,4(1) into the region R4(V") between C,, and C,11, 
including C,4;. Since the entire plane, excluding w; and we, is covered 
in this fashion, and no point is in two such regions, any one of them, 
together with w; and we, is a fundamental region of I'4(f), and the 
proof is complete. 


We are concerned here only with the upper half-plane U; relative 
to this, a fundamental region of I'4 (f) is that portion of any one of the 
above regions which hes in U. 

In the next section it will be convenient to have slightly more 
freedom in choosing a fundamental region of T'4(f). We get this by 
noticing that, instead of beginning with the line Co, we could have 
started with any member of the family of circles 


ad = ¢. (23) 


2 — W2 


For fixed c > 0, a fundamental region R4(c, 1) would then be the 
ring between the circle (23), which we might call Co(c), and its trans- 
form 

Z2— @] 


Cy(c): = Ke; 


2 — We 


the argument given above carries through with no change except for 
the introduction of a factor c in certain equations. Such a region is 
shown heavily shaded in Fig. 1-3. 


1-9 Reduction of indefinite forms. The semicircle 2 representing 
the form f is the upper half of the circle given parametrically by 


2 
Ca 
(=*) =, 0<t<o. 
Z— we 


The generating automorph V, given by equation (22), changes 2 
into the upper half of the circle 


eT = 
(5) = — Kt? = — (tVK)?, 0 < t < 2, 


/ 
2 — 9 


which is the same circle with a different parameter. In other words, 
Z is transformed into itself by V, and hence by any element of Ta(f), 


30 BINARY QUADRATIC FORMS [cHAP. 1 


in the sense that each point of 2 goes into some other point of 2, 
although no points of 2 remain fixed except w; and we. In fact, that 
arc of 2 which lies in a fundamental region Ry (c, 1) is mapped by V” 
onto the are of 2 which lies in the region R4(c, V”), so that these 
various ares are equivalent with respect to I'4(f). Hence they are 
also equivalent with respect to the larger group Ir. 

Now imagine = drawn in Fig. 1-3. For suitable choice of c, the 
circle Co(c) defined in the last section intersects 2 at a point on the 
boundary of one of the transforms of R, and this is then also true of 
the equivalent point which is the intersection of C,(c) and 2. The 
arc between these two points is thus broken up by the boundaries of 
the double triangles in Fig. 1-2 into a finite number, say y, of smaller 
ares. If these short arcs are transformed back into FR by suitable 
operations of I’, then every point of 2 is equivalent to some point on 
each of these new arcs; in other words, there are precisely » elements 
of I which transform 2 into a semicircle intersecting R. Hence 
there are precisely «1 reduced forms equivalent to f. 


THEOREM 1-10. There are only finitely many reduced forms in any 
equivalence class of integral primitive indefinite forms. 


Using the definition of reduced form, it is simple to characterize 
reduced forms in terms of their coefficients. For clearly [a, b, c] is 
reduced if and only if one or both of the points p and —p? are inside 
the semicircular region bounded by 2, or if p is on 2. The points 
below 2 in U are the points z = x + ty such that 


a(a(x? + y?) + br +c) <0. 
Since p and —p” have the coordinates 


i V/3 
C= —y =; 
q? Yr 9 


we have that f is reduced if and only if either 
a(2a +b-+ 2c) <0 or 2a —b6+2c = 0. (24) 
To find the set of reduced forms of the class containing a given form 
[a, b, c], the procedure outlined for definite forms may first be used 
to reduce (2, y) = a/2(x — wyy)* + a/2(x — wey)? = ax? + bry + 
(b? + D)y?/4a; the transformation which reduces ¢ (2, y) also 
reduces [a, b, c], say to [a,, 61, c,|. Thus the semicircle 2; represent- 


1-9] REDUCTION OF INDEFINITE FORMS 31 


ing [a,, bj, c,] intersects the fundamental region R of I, either in an 
are or in the single point p. Starting from a point on 2, in R, move 
along 2, in the direction in which it is oriented. At the point at 
which 2, leaves R, it enters one of the regions 


R(S™), R(S“W), R(WSW), RWS), R(W), 
R(WS-), R(WS“W), R(SW), or R(S), 


since these are the only regions adjacent to R (cf. Fig. 1-2). If it 
enters R(T), then 7, —! sends 2, into a new semicircle D» (associated 
with [a2, be, c2]) which has an are in R, and this arc is the image 
under 7! of the portion of 2, in R(T,). The same argument can 
now be applied to Zo, leading to a 23 (associated with [a3, b3, c3]) 
which has an arc in R, and this arc is the image under T. !T,~' of 
the arc of 2; next encountered in moving along 2, in the positive 
direction. If the process is repeated pu times, 2, and [a,, bj, cy] will 
recur. 

It is rather the exceptional case that 2 passes through p or —p”. 
If it does not, the array of possible transformations listed above 
simplifies: the only T’s to consider are then S, S7!, and W. For 
example, consider the reduced form [2, —4, —1], where w, = 
(2+ V6)/2, we = (2 — V6)/2. 2, goes from R to R(W), so we 
make the inversion W—! = W, or z = —1/z’. This replaces [a, b, c] 
by [c, —b, a], so here [ae, be, cz] = [—1, 4, 2]. Ze goes from R to 
R(S), so we make the translation S~', or z = 2’ +1. In general 
this replaces [a, b, c] by [a, 2a + b,a + 6+ cl, so here faz, b3, c3] = 
[—1, 2, 5]. 23 also goes from R to R(S), and we get [a4, b4, c4] = 
[—1, 0, 6]. 24 also goes from R to R(S), and [as, bs, cs] = [—1, —2, 5]. 
A final application of S~! gives [ag, bs, ce] = [—1, —4, 2]. Since Ze 
goes from R to R(W), we invert, to get (a7, b7, c7] = [2,4, —1]. 27 
goes from R to R(S—), so we must make the translation S:z = 2’ — 1. 
In general, this replaces [a, b, c] by [a, b — 2a,a — b +c], so here 
[ag, bg, cg] = (2,0, —3]. A second application of S gives [ag, bg, cg] = 
(2, —4, —1] = [a,, b;, ci], and we have the complete set of reduced 
forms for this class. If the algorithm were repeated indefinitely, a 
periodic sequence of forms would arise; it is therefore meaningful to 
speak of the perzod of reduced forms. 

The following principle is useful in these calculations: If after a 
translation the inequality (24) is correct for just one choice of sign, 


32 BINARY QUADRATIC FORMS [cuap. 1 


the next step is an inversion, while if it holds for both signs, the next 
step is a repetition of the translation. (S is never followed by S~, 
nor W by W.) The reason for this should become clear upon looking 
back at the derivation of (24). 


THrorem 1-11. There are only finitely many classes of integral 
indefinite forms of given discriminant. 


Proof: First consider the primitive forms; for them it suffices to 
show that there are only finitely many reduced forms of given dis- 
criminant A = —D. From (24) we get 


2a” + ab < —2ac, 
sO 


4a? + 2ab + b? < b? — 4ac = D. 


But for each choice of sign, 4a? + 2ab + 6 is positive definite; it 
therefore represents only positive integers unless a = b = 0, and by 
Theorem 1-6, each of the integers 1, 2, ..., D is represented in only 
finitely many ways. Hence there are only finitely many choices for 
a and b, and for each choice, c is fixed by the requirement b? = D+-4ac. 
There are therefore only finitely many reduced forms, and hence only 
finitely many periods, and so only finitely many classes. 

If a class contains an imprimitive form, say with (a, b, c) = d, then 
every form in that class also has divisor d, so that the class consists of 
the elements of a class of primitive forms with smaller D, each multi- 
plied by d. There are only finitely many such classes. 


PROBLEMS 
1. Find the period of reduced forms belonging to the class of 
x? + Try + Ty?. 
2. Show that Theorem 1-7 remains correct if the word “indefinite”’ is 
omitted, that is, if D < 0 (ef. Theorem 1-4). 


3. Show that there is just one class of primitive forms with D = 20, and 
one class of imprimitive forms. 


1-10 Representations. The discussion occurring between Theo- 
rems 1—4 and 1-6 made no use of the definiteness of the form; it is 
therefore equally applicable to indefinite forms. Thus Theorem 1-6 
can be recast as follows. 


1-10] REPRESENTATIONS 33 


Turorem 1-12. Let f = [a,b,c] be a primitive integral indefinite 
form of discriminant A, where D = — Ais nota square. Let n be an 
integer. Corresponding to each minimum root m of the congruence 
(15), determine l by (16). If none of the forms [n, m, l| ts equivalent 
to f, there are no proper representations of n by f. If at least one of the 
new forms 1s equivalent to f, there are infinitely many proper repre- 
sentations of n by f; they are given by the first columns of all the ma- 
trices AT, where A can be any automorph +V” of f, and T is any of a 
set of matrices which replace f by the various equivalent forms [n, m, ll), 
each form being obtained from just one T. 


PROBLEMS 


1. Discuss the proper representation of 13 by [1, 3, —1]. 


2. Show that the odd numbers properly represented by x? + 4zy — y? 
are those of the form 


Tr 
5° II p:*, 
=] 


where e = 0 or 1, r> 0, and p; = +1 (mod 10) for 1 <1< r (cf. Prob- 
lem 3, Section 1-10). 


CHAPTER 2 


ALGEBRAIC NUMBERS 


2-1 Introduction. With a few exceptions, the theory developed 
up to this point, both in this volume and in the preceding intro- 
ductory volume, has been self-contained, in the sense that the prob- 
lems, which had to do with the ordinary integers, were solved 
without going outside this system. When considering the distri- 
bution of primes and the theory of quadratic forms, we made use 
of the real and complex numbers, but not in an intrinsically arith- 
metic fashion. In the investigation of the representability of an 
integer as a sum of squares,* however, we had occasion to consider 
the arithmetic structure of the set of Gaussian integers, and to apply 
this to a problem involving ordinary integers. During the last 
century, it has been found that many problems in rational arithmetic 
are treated most naturally by introducing larger sets of “integers” 
and deducing, from the structure of the extended system, information 
about the ordinary integers. Of course, as soon as a mathematician 
begins to work in a new medium, to use a metaphor from art, he 
finds interesting questions which have little or nothing to do with the 
original problem. In the present case, this tendency was instrumental 
in the development of modern abstract algebra, a large portion of 
which has only a tenuous connection with number theory. 

From the point of view of this text, general algebraic theory must 
take second place, the primary object being to give the reader an 
appreciation of the power afforded by the method, as well as a knowl- 
edge of some of the basic results in the subject. For this reason, the 
formulation will be kept as concrete as possible; there will be no 
striving for generality or abstractness for their own sakes. The 
treatment is self-contained, except for the following two theorems, 
whose proofs can be found, for example, in L. E. Dickson, First 
Course in the Theory of Equations (New York: John Wiley & Sons, 
Inc., 1921), pp. 180-131 and 124-125, respectively. 


*See, for example, Volume I, Chapter 7. 
34 


2-1] INTRODUCTION 35 


The product D,De2 of two determinants of the same order is another 
determinant of that order, whose element in row i and column j is the 
sum of the products of the elements of the ith row of D, and the corre- 
sponding elements of the jth row of Do. 


SYMMETRIC FUNCTION THEOREM. Any polynomial P(x,,..., tn) 
symmetric in X1,..., Xn and of degree g in each, is equal to a poly- 
nomial of total degree g, with integral coefficients, in the elementary 
symmetric functions 


> 41, > 2129, ee ee 


and the coefficients of P(a1,...,2%n). In particular, any symmetric 
polynomial with integral coefficients is equal to a polynomial in the 
elementary symmetric functions with integral coefficients. 

If Pisa polynomial in the roots of an equation f(x) = 0 of degree n 
and leading coefficient 1, and if P is symmetric in n — 1 of the roots, 
then P is equal to a polynomial, with integral coefficients, in the re- 
maining root and the coefficients of f(x) and P. 


We shall also have occasion to use the so-called Fundamental 
Theorem of algebra; this basic assertion is proved in the remainder 
of the section. 


FUNDAMENTAL THEOREM OF ALGEBRA. A _ polynomial f(z) = 
aoz” +--+ + a, having complex coefficients and positive degree, has a 
complex zero. (It follows immediately that it has exactly n complex 
zeros, in the sense that there are complex numbers &, ..., &_ such that 


f(z) = ag(z — &)--- (@ — &).) 


Proof: Since the truth of the theorem depends on the structure of 
the complex numbers, it is necessary to use some properties of these 
numbers. If the entire theory of functions of a complex variable is 
assumed, the proof is very easy indeed: an analytic function has as 
many zeros as poles, and a polynomial has a pole at infinity, so it 
must have at least one zero. If less than this is assumed, it is reason- 
able to ask that as little be assumed as possible. The proof to be 
given uses the fact that a real-valued continuous function of two real 
variables has a minimum value in any closed domain, and it assumes 


familiarity with the symbol Va, where a is real. (If DeMoivre’s 


theorem were used, to give meaning to Va for complex a, the proof 
would be slightly simpler. ) 


36 ALGEBRAIC NUMBERS [cHap. 2 


With the second assumption, the quadratic formula provides a 
proof when n = 2 and the coefficients are real. To solve a quadratic 
equation with nonreal coefficients, it may be necessary to extract the 
square root of a nonreal number. Let the number be a + bz. Then 
the equation a + bi = (x + iy)? gives 


a=z*?—y* and b= 2Qzy, 


or 


4x* — 4az® — b? = 0, 


and we can take 


Before treating the general case, note first that we can write 
f(x + ty) = G(a, y) + iH (a, y), where G and H are polynomials in 
the real variables x and y, with real coefficients. It follows from the 
continuity of G and H throughout the zxy-plane that |f(z)| is contin- 
uous throughout the complex z-plane, where z = x + zy. Moreover, 
for n > 0 and ao ¥ O (which we henceforth assume), we have 


im If()| = @. 


For if max (|ao|, ... , |an|) = A, then 
If(z)| = |aoz"| — (lan| + lan—s2| + +++ + liz”) 


‘ nA 
> |aoz (1 - =) for |z| > 1 


lagz”| 


ae for |z| > max (= : 1) 


Since |f(z)| is continuous, it assumes a minimum value at some point 
in any closed circular disk with center at O, and since |f(z)| becomes 
infinite with |z|, the disk can be chosen so large that this minimum 
occurs at an interior point — We must show that |f(&)| = 0. 

We now proceed by induction: suppose that every polynomial of 
degree less than n, with complex coefficients, has a complex zero, and 
that f is of degree n and |f(z)| assumes its nonzero minimum at £. 


2-1] INTRODUCTION 37 
Suppose that f(é) = M, and put 


ge) = ED it eg... + ber; 


then |g(z)| > 1 for all z. Define k as the smallest index such that 
b, ¥ 0, so that 


g(z) =1+ de +--+ 4+ d2% k<n. 
First consider the case that k <n. By the induction hypothesis, 
the equation 
1+ byz* = 0 
= a root. Let 7 be this root, and put z = én, where 0 <5 < 1. 
en 
g(5n) = 1+ dy b¥n* + bpp d Hy htt 4... 4 bt” 
= 1-8 + Began? +... 4 banter} get, 
Now if |b;| < Bfork <7 <n, then 
ba anh? + eet bag8P 1 
< BL + |nl)"et4 +6 +e. + ar) 


< Bn(l + |n|)78*t! = Cot, 
Thus 


lg(on)| < 1 — 8 + Co*t! = 1 — 8FC1 — ©), 
and for 0 < 6 < 1/C, |g(én)| <1. This contradicts the assumption 
that 1 is the minimum of |g(z)|; hence M = 0. 
If k = n, then g(z) = 1 + bnz”. If n is even, then the equation 


| 1 
gn ——— = 
4 oak. 


is solvable, by the induction hypothesis, and any root of it is also a 
root of g(z)=0. Hence we can suppose that nis odd. Put b,=c+di. 
If c # 0, we put z = —dsgne (that is, z = 6 or —5, according as 
¢<Qorc> 0), and obtain 
[1 + (c+ di)z”|? = 1 — |cl" — 8"di sgn c|? 
= 1 — QJcle* + (c*? + d*)8*"; 

this last expression is again smaller than 1 for 5 sufficiently small, and 
we have the same contradiction as before. 


38 ALGEBRAIC NUMBERS [cHaP. 2 


If c= 0, then d #0; moreover, a sign can be chosen so that 
(47)” =7. Thenif z = +76 send, we have 


|1 + id(+76 sgn d)| = |1 — |d|s"|, 


and this is smaller than 1 for 6 sufficiently small. The proof is 
complete. 


2-2 Polynomials and algebraic numbers. We begin by making 
the following definitions. 

(a) F is the set of all rational numbers. 

(b) R[z] consists of R together with all polynomials in x with 
rational coefficients, the coefficient of the highest power of x being 
different from zero. 

(c) If a polynomial p(z) is in R[x], deg p means the exponent 
of the highest power of x occurring in p(x), if this is positive; if 
a ~ 0isin R, deg a = O, while if a = 0, deg a is not defined. 

(d) A polynomial p(x) in R[z] is said to be monic if the leading 
coefficient is 1. 

(e) If pi (x) and pe(x) are in R[x], we say that po(x) divides p,(z) 
(in symbols, po(x)|p1(x); the phrase does not divide is indicated by 
the symbol “{’’) if there is a g(x) in R[x] such that p, (2) = po(x)q(z). 
Under this definition, an element of R different from zero divides 
every element of R[x]. The nonzero elements of R are therefore called 
units of R[z). 

(f) An element p(x) is said to be zrreducible in R[x] if it cannot be 
written as the product of two nonunit elements of R[z). 

By formalizing the ordinary process of dividing one polynomial 
by another, it is not hard to show that if p;(x) and pe(z) are in R[x], 
and p2 (x) is not zero, then there exists a unique pair of elements q(z) 
and r(x) of R[x] such that 


Pi(x) = po(x)q(x) + r(x), deg r < deg po orr(x) = 0. 


This analog of the division theorem for integers* forms the basis for a 
Euclidean algorithm, by means of which a greatest common divisor 
(p(x), po(z)) can be determined; the development is entirely 
parallel to that for the integers, and leads to the following theorems, 


*See, for example, Volume I, Theorem 1-1. 


2-2) POLYNOMIALS AND ALGEBRAIC NUMBERS 39 


THEOREM 2-1. Given two elements p(x), po(x) of R[x], not both 
zero, there 1s another element d(x) which is unique to within a unit 
factor and which has the following properties: 

(a) d(x)|pi (x) and d(x)|po(z). 

(b) Lf d(x) is in Rix], and divides both pi(x) and po(x), then 
dy (x)\d(z). 

If (pi(x), po(x)) = d(x), there are elements qy (x) and qe(x) of 
R[x] such that 


Pi(x)qi(z) + po(x)go(x) = d(x). 


THEOREM 2-2. Any nonzero element of R[x] can be factored into a 
product of irreducible elements of R[x], and this factorization is unique 
except for the order of factors and the presence of units. 


There is no loss in generality, and some gain in simplicity, in sup- 
posing that the various polynomials with which we deal are monic, 
since any polynomial can be made monic by multiplication by a unit. 
In this case the second part of Theorem 2-2 could be restated to read: 
The factorization of a monic polynomial into irreducible monic elements 
1s unique except for the order of factors. 

We now consider the zeros of the polynomials of R[z], or, what is 
the same thing, the roots of equations p(x) = 0. If a@isa root of the 
equation 


p(x) =a + rye + roe™ 7? +--+ tr, = 0. (1) 


where p(x) is in R[x] and n > 0, then a is called an algebraic number; 
if p(x) is irreducible in R[x], a is said to be of degreen. (The rational 
numbers are algebraic numbers, since if r is in R, x — r = 0 has the 
rootz = r. Asalgebraic numbers they are of degree 1, although when 
considered as elements of R[x] the nonzero rational numbers were 
given degree 0.) An algebraic number a is a zero of a unique monic 
irreducible polynomial in R[x], called the defining polynomial of a. 
For if p(z) is not irreducible, it can be factored uniquely into irre- 
ducible monic factors, and a must be a zero of one of the factors. 
Hence a satisfies some irreducible equation, i.e., an equation in which 
the left side is irreducible in R{z]. If a satisfies two such equations, 
say p(x) = Oand q(x) = 0, then it also satisfies the equation d(x) = 0, 
where d(x) = (p(z), q(x)). For if 


p(x)si(z) + q(z)se(z) = dz), 


40 ALGEBRAIC NUMBERS [cHap. 2 
then 

d(a) = s,(a)-0 + se(a) -0 = 0. 
But since p(x) and q(z) are irreducible, their monic «cp is either 1 or 


p(x). Since 1 ¥ 0, (p(x), q(x)) = p(z), and p(x) = q(z). 
If p(a) in equation (1) is the defining polynomial of a, its n zeros 


Q1] = @, Q2,...,@ are called the conjugates of a. Except for an 
alternation in signs, the numbers 71,79,...,7». are simply the 
elementary symmetric functions of a; = a, ag,..., On: 

Up — Dia = —(a+ag+---+ an), 

T2 = Viarag = aag + +++ + An_14n, 

Tn = (—1)"aag++* ap. 


As is the case here, we shall frequently use a Greek letter, both with 
and without the subscript 1, to denote a single algebraic number. 


THEroreM 2-3. The sum, difference, and product of two algebraic 
numbers are algebraic numbers. The quotient of two algebraic numbers 
as an algebraic number if the denominator is not zero. 


Proof: Suppose that a = a, and 8 = @, have defining polynomials 
plz) = 2 + ryt” +++ + tp = (4 — a1) (@ — ag) +++ (© — an), 
q(x) = a + sz") +-++ + Sm = (& — By) (x — Bo) +++ (& — Bm), 


respectively. Let 1, v2,-.-,Ynm be the numbers obtained by 
adding an a; and a §;, in all possible ways. Then the polynomial 
g(x) = (x — ¥1)(%& — v2) +++ (%& — Yam) has, as coefficients, sym- 
metric polynomials in the a; and 6;, with integral coefficients. Let 
one such coefficient be ¢(a1,..., an, B1,.-.,8m). AS @ symmetric 
polynomial in the a; it is equal to a polynomial in rj, ... , 72, whose 
coefficients are themselves polynomials in 6,,..., Bm with integral 
coefficients. These last polynomials are symmetric in 6),..., Bm; 
they are therefore integral combinations of s),..., 8m, and conse- 
quently are rational numbers. Thus the coefficients of g(x) are 
rational numbers, and a + 8 is an algebraic number. The same proof 
applies for a- 8 and a — 8, with obvious changes in the definition of 
Vly - ++ 5 Ynm- 


2-2] POLYNOMIALS AND ALGEBRAIC NUMBERS 4] 


If a is algebraic and different from zero, so is 1/a, for the zeros of 
the polynomial 
rt” + tne” 1 +---+rnzti 
are the reciprocals of those of 


a try” tte ee try, 
andr, ~ 0. Thus the assertion that a/8 is algebraic is a consequence 


of the fact that a- ; is algebraic. 


The properties of the set of all algebraic numbers mentioned in 
Theorem 2-3 are shared by many sets of importance in mathematics; 
so many in fact that the name field has been reserved to describe such 
sets. Technically, a field F is a set of two or more elements a, b,..., 
together with an equivalence relation (which we designate by an 
equals sign) and two operations (which we designate by the symbols 
“1 and “-’’), such that the following relations hold: 

(a) For any a and 6 in F, eithera = bora +b. If a=b, then 
atc=b+canda-:c= b-c, for every cin F. 

(b) The elements form a commutative group with respect to the 
operation ‘‘-++”’’, the identity element being designated by ‘0’. In 
other words, if a, b, and carein F, thena + bisinF,a+b=b+a, 
a+ (b+ c) = (a+ 5b) +c, there is an element —a in F such that 
a+ (—a) = (—-a) +a=0,anda+0=0+a=<a. 

(c) The elements with 0 omitted (which we might call F*) form a 
commutative group with respect to the operation ‘“‘-’’, the identity 
element being designated by “‘1’’. 

(d) Multiplication is distributive with respect to addition; that is, 
a-(b-+c) =a-b+a-c for every a, b, and cin F. 

As long as one is working with a set of real or complex numbers, and 
ordinary multiplication, addition, and equality, one can show that 
the set forms a field just by showing that if a and 6 are in the set, so 
are a -+ b, ab, and a/b if b = 0; the other requirements are auto- 
matically fulfilled. Thus Theorem 2-3 is just the assertion that the 
set of all algebraic numbers is a field. Other familiar examples of 
fields are the set of all rational numbers, the set of all real numbers, 
and the set of all complex numbers. (The integers, on the other hand, 
do not form a field, since only the elements +1 have inverses, under 
multiplication, in the system.) In fact, every field composed of 


42 ALGEBRAIC NUMBERS [CHAP. 2 


complex numbers together with the ordinary operations of addition 
and multiplication, contains the field R of rational numbers as a 
subfield. There are, however, fields with only finitely many elements. 
An example of such a field is the set of numbers 0, 1,..., p — 1 with 
the operations of addition and multiplication modulo p; in this case, 
a+ 6 is that element c such that a + b =c (mod >p); a-b is that 
element d such that a:b = d (mod p); —ais0 or p — a, according 
as ais 0 or not 0; if a ~ 0, a? is that element f such that a-f = 
1 (mod p). 

The field of all algebraic numbers will play no role in the present 
discussion. We consider instead certain subfields of it, called algebraic 
number fields, described in the next theorem. 

Let 3 be an algebraic number, of degree n > 1, whose defining 


polynomial is p(x) as given in equation (1), and whose conjugates are 
0, Bo, eo 8 e i Ons 


THEOREM 2-4. The set of all numbers of the form 
qi (F) 
a=———), 2 
go (#) @) 
where qi(x) and go(x) are in R[x] and qo(8) ¥ 0, is a field, which 
will be denoted by R(#). Every element of R(#) can be expressed 
uniquely in the form 
a = a + ad + ore Bree + Ano", 
where ao, Qj,..., An, are in R. 
Proof: The first part is clear, since the sum, difference, product and 
quotient of rational functions are again rational functions. 
Since g2(#) # 0 and p(x) is irreducible, qo(x) and p(x) are rela- 
tively prime, and for some ¢(x) and s(z) in R[z] 
t(x)p(z) + s(x)go(x) = 1. 
This gives s(3)qo(8) = 1, and 


? 


qi () 
a= —— = s(8)qi (8), 
928) oe 
a polynomial in &. Since p(v) = 0, 
o” = —7r0"} ey roo" — eee 


me 


It follows that every positive power of 3 can be written as a poly- 


2-2] POLYNOMIALS AND ALGEBRAIC NUMBERS 43 


nomial in 3 of degree n — 1 or less. The same is therefore true of 
every element a. If there were two different representations of a as 
polynomials in 3 of degree n — 1 or less and with rational coefficients, 
their difference would be a polynomial of degree n — 1 or less which 
vanishes for z = 3, which is impossible. 


If a is an element of the field described in Theorem 2-4, and 
a= apd” 1 +-+++4,1= ¢ (8), 


then the numbers 


a’ = a, al’ = (8), shay a = o(Fn) 


are called the field conjugates of a. (They may not lie in the field 
described in Theorem 2-4.) Every field conjugate of a is also a con- 
jugate of a in the earlier sense, for if a has the defining equation 
g(x) = 0, then g(v(x)) vanishes for x = 9, so that p(zx)|g(o(x)) and 


g(y(o%)) = g(a) = 0. The converse is also true, as the following 
theorem shows. 


THEOREM 2-5. The set of field conjugates of an element a of R(#) is 
either identical with the set of conjugates of a, or consists of several 
copies of the set of conjugates of a. (Hence deg aldeg 3.) The poly- 
nomial whose zeros are the field conjugates of ais a power of the defining 
polynomial of a; wf it is equal to the defining polynomial, then 
R(a) = RP). 


Proof: Form the field polynomial for a: 
f(t) = @— a’)(e — a’) ++ (@ — a), 


Its coefficients are symmetric polynomials in the «™’s, and are there- 
fore symmetric polynomials in 31, ...,%,, and so are rational num- 
bers. Factor f(z) into its monic irreducible factors in R[x], say 


F(x) = fil) -fo(x)---, 
and let f; (x) be a factor which vanishes for zx = a. Then f;(9(#)) = 0, 


so p(x)|f:(y(z)), and f,(x) vanishes at a’, a’ ..., a™. If these 
are distinct, f;(z) is of degree n, and f(x) is irreducible. If they are 
not, let a, a’,...,a be a maximal distinct set of a’s. Then fo(z) 


vanishes for some a“), so f,(x)|fo(x); since fo(z) is irreducible, 
fo(z) = of1(x), and c = 1 since f,(x) and fe(x) are monic. If there 


44. ALGEBRAIC NUMBERS [cuap. 2 


are other factors of f(x), the argument can be repeated. Eventually, 
we find that 


f(x) = (fi(z))”™". 


Since the zeros of f(x), which is the defining polynomial of a, are the 
conjugates of a, those of f(x) (that is, the field conjugates) consist of 
n/t copies of the set of conjugates of a. 
Now suppose that f(z) = f at Define 
On 
o(x) = so) |— Ss a $25], 


L—- a 


so that ¢(zx) is a polynomial of degree n — 1 with rational coefficients. 
Since 


g(a) = F(a — a’’)--- (a — a) == df/(a), 
we have that the number 
g(a) 
f" (a) 


is in R(a), so that R(#) is a subfield of R(a), and R(a) = R(#). 


The last assertion of the theorem shows that if one field R(a) is 
a proper subfield of a second field R(#), then deg a < deg’. For if 
deg a = deg?#, then the field polynomial of a with respect to R(#) is 
irreducible, so that f; (x) = f(x), and R(a) = R(8). 

The field A (#) is called an algebraic number field; we say that R(d) 
is obtained by adjoining 3 to R, and call R(#) a simple algebraic 
extension of R, of degree n. This same field can be obtained by 
adjoining various other numbers to R; for example, R(23) = R(#). 
If an element a of R(#) is such that R(a) = R(#), then a is called a 
primitive element of R(#). It is clear that the degrees of any two 
primitive elements are the same, and both are equal to the degree 
of the field. 

There is, of course, no reason why the process of adjunction cannot 
be repeated ; one can start from R(#) and adjoin an algebraic number 
n to it by taking all rational functions of » whose coefficients are 
elements of R(#). This new field is denoted by R(#)(n), or more 
simply by R(@, 7). 


= 


THEOREM 2-6. If & and n are algebraic numbers, the adjunction of 
n to R(&) gives the same field R(8, 7) as the adjunction of 3 to R(n). 


2-2] POLYNOMIALS AND ALGEBRAIC NUMBERS 45 


There exists an algebraic number ¢ such that R(8, 7) ts identical with 


R(S). 


Proof: The first part is clear, since both R(8, 7) and R(n, 3) are 
identical with the field consisting of all numbers of the form 


71 (8, n) 
g2 (3, n) 


where gi(z, y) and qge(x, y) are polynomials in two variables with 
rational coefficients. 

If » is an element of R(#), then R(#, ») = RW), since a rational 
function of a rational function is again a rational function. Assume 
then that 3 and 7 do not lie in the fields R(n) and R(#) respectively. 
Let their defining polynomials be p;(z) and pe(x), and let their 
conjugates be 31,...,%, and 7,..., %m, respectively. Let a and b 
be rational numbers, and let ¢ = 1, ..., &nm be all expressions of the 
form ad; + by,. Since the conjugates of 3 are distinct, as are the 
conjugates of 7, there is only a finite set of ratios a/b for which some 
two of the ¢’s are equal, and we choose a and 6 so that a/b is not in 
this set. Furthermore, we order the {7 so that ¢ = ad + by. 

Now put 


) go(0, n) x 0, 


fw) = @ — 1) @ — $2) +++ @ — Sam). 


This polynomial has no multiple zeros, and its coefficients, being 
symmetric in the 3’s and 7’s, are rational. We show that R(#, 7) = 
R(¢). It is clear that every element of R(¢) is in R(, 7). Suppose 
on the other hand that p is in R(8, n), and that 


= q1 (3, n) ; 
go(0, 7) 


Then we can define the numbers p = p1,..., Pnm by the equation 


qo(#, n) x 0. 


— q1 (3;, nk) ; 

‘ q2 (3;, nk) 
where the same subscripts appear on # and 7 in the definition of p; 
as in the definition of ¢;, fori = 1,2,...,2m. Now put 


oD = F(@) ict ae eo ve ak 


46 ALGEBRAIC NUMBERS [cHAP. 2 


by the Symmetric Function Theorem, the coefficients of F(z) are 
rational. Ifz> 1, the polynomial 


f(z) = pila — £) --- (© — Sia) (@ — bey) ++ (@ — Sam) 


L— ¢3 
vanishes for x = ¢, and from the representation 
f(z) = = p(w — f2) + * (@ — Sam) 
zr—f¢ 
we have 
F(S) = oF — f2) +++ © — Sam). 
Since 
EO) i (¢ _ fe)c°- (¢ = Cain) ~ 0, 
this gives 
7A FO 
ie)’ 
and p is in R(£). 
PROBLEMS 


1. Prove Huisenstein’s irreducibility criterion: a polynomial f(z) = 
Qo + ayx +---+a,2" with integral coefficients cannot be written as a 
product of two or more polynomials with integral coefficients and positive 
degrees, if there is a prime p such that 


pian, pla;ifi<n, and p*ta. 


[Hint: Suppose that there is such a p, but that f(x) = g(x)h(x), where 
giz) =bo t+ bit + --- 4+ 6,27, h(x) =eo+ cir +--- + ,2%. It follows 
that p divides exactly one of bo and co—say bo. Let b; be the first coefficient 
in g(x) not divisible by p, and deduce a contradiction from the expression 
for a; in terms of the b’s and c’s.] As we shall see later (Theorem 2-21), 
irreducibility over the set of polynomials with integral coefficients implies 
irreducibility over R[z]. Use this fact in Problem 2. 
2. Show that the following polynomials are irreducible over R[z]: 


(a) x" — p, p a prime. 
(b) 2?-1 + 2?-24---+2+1. (Hint: Replace x by z + 1.] 
(c) 2% + 327 + 4. 
3. Show that R(0/2, V3) is identical with R(V/2 + V3), and find a ra- 
tional function r(z) with rational coefficients such that r (0/2 + -V 3) = V2. 


2-3] ALGEBRAIC INTEGERS 47 


2-3 Algebraic integers. If the defining (monic) polynomial of an 
algebraic number @ has integral coefficients, ? is said to be an algebraic 
integer. ‘This is a direct extension of the notion of ordinary or rational 
integers, which are the zeros of monic linear polynomials with integral 
coefficients. Hereafter we shall designate by Z the set of all rational 
integers. 


THEOREM 2-7. The sum, difference, and product of two algebraic 
integers are again algebraic integers. 


The proof follows the lines of the proof of Theorem 2-3. 


THEOREM 2-8. If a is a zero of a monic polynomial with coefficients 
in Z, then a is an algebraic integer. 


Proof: Suppose that f(z) = x +----+ ay is the polynomial, 
and that p(x) = 2" + rx"! + ----+ 7, is the defining polynomial 
of a. Let bo be the tcm of the denominators of the reduced fractions 
T1,.++ Tn, SO that bop(x) = q(x) = box™ + Byx™ 1 +--- +5, has 
relatively prime rational integral coefficients. Then g(x) divides f(z), 


the coefficients in the quotient polynomial being rational, and we can 
write 


f(z) _ og(z) 

q(x)’ 
where c and c’ are so chosen that g(x) has relatively prime coefficients 
in Z. Thus cf (x) = cg(x)q(x), and the coefficients of the product 
g(x)q(x) are relatively prime.* Since this is also true of the coefficients 
of f(z) (for f(z) is monic), we conclude that c = c’. Comparing the 


coefficients of x” in the equation f(r) = g(x)q(x), we see that boll, 
and hence bb) = +1, which was to be proved. 


THEOREM 2-9. If ais a root of an equation 


f(z) = 2" + Bye” *+---+ Br =0, 


in which B,,..., Bn are algebraic integers, then a is an algebraic 
enteger. 
Proof: By Theorem 2-6, 8;,..., 8, all lie in a simple extension 


field R(d), of degree m, say. We can use the sets of field conjugates 


*The reader may prove this simple fact himself, or refer to the remark 
preceding Theorem 3-14, Volume I. 


48 ALGEBRAIC NUMBERS [cHap. 2 


(By, ery Ba): eS. (8,, ae Bn) 
to form polynomials 
fo(z) =a? + By’x™ 1 4+---+ Br”,  ..., 


Sm (x) = 2 + By a" ++ + Ba. 


The product f(x)fo(xz) - - -fm(z) has rational integral coefficients and 
is monic; by Theorem 2-8, a@ is an algebraic integer. 


The set of integers in a fixed algebraic number field R(#) is also 
closed under addition, subtraction, and multiplication. We shall 
designate this set by R[8], and call it the zntegral domain of the field. 
In particular, R[1] = Z is the set of rational integers. 


THEOREM 2-10. If 3 is an algebraic number, there exists some 
rational integer a ~ 0 such that ad is an algebraic integer. If 3 
satisfies an equation Box” + --- + Bn = 0, in which Bo, ..., Bn are 
algebraic integers, then Bod is an algebraic integer. 


Proof: Let the defining equation of 3 be 
p(x) = a* + rj27'4+---+7r, = 0, 


and let the tcm of the denominators of the fractions 71, ..., 7, bea. 
Then the polynomial 


a”p (=) = 2" +arnjz™)4+---+ arn 


has integral coefficients and is monic and irreducible; its zeros ad, 
ado,..., ad, are therefore integers. The proof of the second part, 
using Theorem 2-9, is similar. 


Since R(#) and R(ad) are identical for a ¥ 0 in Z, any algebraic 
number field can be considered as the result of adjoming an algebraic 
integer to R. 

If 3 is an integer, so are its conjugates 3o,...,%n. The same is 
therefore true of its field conjugates. 

If @ is any element of the field R(#) of degree n, the product 
ac’’--- a) of all the field conjugates of a is called the norm of a, 


and denoted by Na (a more complete notation would be Npigya). 


THEOREM 2-11. The norm of an algebraic integer is a rational 
integer. 


2-3] ALGEBRAIC INTEGERS 49 
Proof: If a has the defining equation 
i + sa” tere t+ Sm, 


then the norm of « (in any given R(#) containing a) is a power of Sm, 
by Theorem 2-5. 


THEOREM 2-12. If a and B are elements of R(&), then 
N(a8) = Na- NB. 
Proof: Put 
a= A) +48 +--- + any", 
B = bo + bd +--+ + bp_8". 


Then in the product a8, powers of 3 higher than the (n — 1)th can 
be reduced using the equation 


orti — — giy™t $e tery) (4) 


(3) 


derived from the defining equation of 3. Also a and p™ can be 
obtained from (3) by replacing 3 by #;, and in the product a“p™, 
higher powers of 3; can be reduced by using (4) with 3 replaced by #,. 
Hence the field conjugates (a8)’, (a8)”’,..., (#8)™ of eB are simply 
af, BY... a™Be™. ~Thus 


Naf = (a8)" (a8) - ++ (a8) 
= a’/al!---a'™p’B’’..-B™ = Na- NB. 
Now let a, 6, ..., v be n elements of R(#), with field conjugates 
a® B®) |. ,v™, where k = 1,2,...,n”. The number 
a a’ 1... aM 
Vl (n) 
AGB nee To ©, 
v y" y(n) 
is called the discriminant of a, B,...,v. Its value is independent of 


the order of rows or of columns. 


THEOREM 2-13. If a, B,...,v are in R[dJ, then A(a, B,...,v) 
1s a rational integer. 


50 ALGEBRAIC NUMBERS [cHaP. 2 


Proof: If we take the row-by-column product, we have 


A(a,...,¥) 


a test (a)? 12. ap tees bf aMy™ 


av free bt QM y™ a Pipes be (ym2) 
Just as in the proof of Theorem 2-12, 
a8 + op! lees ope a Bir) = af + (ap)! cere. (a8), 


and the sum of the field conjugates of an integer is itself a rational 
integer, by analogy with the proof of Theorem 2-11. Hence, the 
number A(a, 8,...,¥v) can be written as a determinant with rational 
integral entries, and so is a rational integer. 


The numbers 1, 3,..., &”? are said to form a basis of R(#), in the 
sense that every element of R(&) can be expressed in a unique way as 
a linear combination of these numbers, with coefficients in R (ef. 
Theorem 2-4). We now examine the possibility of finding a basis for 
R{d]; that is, a set of elements of R[#] such that every element of Rd] 
can be expressed in a unique way as a linear combination of them, 
the coefficients in this case being in Z. To emphasize the distinction 
between these two kinds of bases, the second is sometimes called an 
integral basis. Every integral basis is a basis of R(#), as is imme- 
diately seen from Theorem 2-10, but the converse is false. 

If w1, ..., @n 1s to be an integral basis, then for any p in R{¥], the 
equation 

p = 101 +++ + Inn, 
and therefore also the equations 
p™ = za, + --- + 2,0,™, a” ee 


must hold for some rational integers 71,...,%n. If A (w1,..., wn) #9, 
this system of equations can be solved, giving each 2; as the quotient 
of determinants, the determinant in each denominator being a 
square root of A(w;,...,n). It seems plausible that the smaller 
|A(w,...,n)|, the better the chance of obtaining rational integral 


2-3] ALGEBRAIC INTEGERS 51 


zt; Hence, if an integral basis always exists, the next theorem ought 
to be true. 


THEOREM 2-14. If 1, wo,...,@n are any n integers of Rs] for 


which |A(w1, w2,.--,@n)| has its smallest possible value different 
from zero, then w1,..., @n form a basis of R{d}. 
Proof: Write 
—1 
wo = dD a0, a4=1,2,...,7 (5) 
j =0 
where the a;; arein R. Then 
n—-l ; n—l _ [2 
2 3 Q,;0" eee Yani? 
Wy ee Wn j=0 j=0 
A(wi,...,@n,) = é = ) 
Wy (n) ; Wy,” n—-1l 


n—l 
2; Ayj0n’ >.. Dy Anjin? 
j3=0 j=0 


and this can be factored: 


1 @# ee or} Q10 ~2- Ano 
A(wy,...,@n) = ; . i ‘ : 
1 On eee 3,” | Q1,n—1 coe Qan,n—1 
= (det la;;|)?A(1, 8, coy a (6) 


where #, 32, .. . , #, are the conjugates of ¥. Since A(w,..., wn) ¥ 0, 
also det |a;;| + 0, and the system of equations (5) can be solved for 


the numbers 1, 3, ... , #”~', giving linear expressions in w,... , n- 
Thus every number p of R[?] can be written in the form 

P= byw, +e+-+ Onn; (7) 
where b,,..., 6, are rational. We must show that they are rational 
integers. 


If this is not the case for the p of (7), then some b; has a nonzero 
fractional part: 


b; — [b.] + C, 


where 0 < c < 1, and the symbol [b] means the largest integer not 
exceeding 6b. Put 


po; = p — [bilo = byw, +--+ + cw; +--+ + dnwn. 


52 ALGEBRAIC NUMBERS [cHap. 2 


In just the same way that (6) was deduced from (5), we can deduce 
from the system of equations 


W1 = @), 


o2 = 2; 
O11 = Oz, 


Ps = Dia + dowg +--- + cw; +++ + bron, 


O41 = Wi-+1) 


the relation 


1 0 0; 
0 1 0 

A(w,..- 5 Pay.) Wn) oa b, beac ue be A(w1, W2, » Wn) 
0 0 ais 1 


= c7A (wy, a | Wn). 


But this implies that the discriminant of the system @, ..., Pi,---,@n 
is numerically smaller than that of w), ..., w,, and is not zero, which 
is contrary to the hypothesis that |A(w;, ..., w»)| is minimal. 


Any two integral bases of a single field have the same discriminant, 
since each is the product of the other and the square of a determinant 
with integral entries, as in (6). The common value is called the 
discriminant.of the field; we shall designate it by A hereafter. 


PROBLEMS 


1. Let 3, 3’, and 3’ be the roots of 
(a) 27+ 22 + 6 = 0, 
(b) 2? — 2? —z2r—2=0. 
Compute the numbers Nai») (3h — 2). 
Answer: (a) —206; (b) 4, 19. 


2-4] UNITS AND PRIMES IN R[d] 53 


2. (a) Let f(z) = agx™ + ----+ a, be irreducible over R, and let 
b,0'’,...,8™ be the zeros of f. Show that in R(d), 


n 
ao"A(1, 3, ...,8°71) = (—1)9—-Y2 TT #9), 
+=1 


[This depends on the well-known factorization 


1 1 1 
V1 Xo se 
; ad II (z; z;) 
: : : 1<j<tKn 
zy") Lo") ahaa z,"} 


of a Vandermonde determinant.]| 

(b) If in particular f(z) = 2° + px+q, show that A(l, 2, 3?) = 
—27q? — 4p. 

3. Show that if a,,..., a, are elements of R[#] such that A(ai, ..., an) 
is square-free, then a1,...,@n form a basis for Riv]. 


2—4 Units and primes in R[v]. If a and £6 are in an integral 
domain R[{d], we say that 8 divides a, and write Bla, if there is another 
element y of R[#] such that a = By. An integer e such that e|1 is 
called a unzt of R[d]. We say that a and £8 are associates if a = ef, 
where ¢€ is a unit. 


THEOREM 2-15. An element of R{8] 1s a unié of and only tf tts norm 
(as an element of R(&)) ts +1. 


Proof: If € is a unit and 
zg” + eam i4+.--+e, =0 
is its defining equation, then the defining equation of 1/e is 


Peo ee ee = (). 
Cm Cm 

Since 1/e is an integer, ¢m = -++1, and N(1/e) is a power of the con- 
stant term in the defining equation of 1/e. (Alternatively, this result 
could be deduced from the multiplicativity of the norm. For if ¢ isa 
unit, there exists an integer e, such that e, = 1. Hence 1 = N1 = 
Nee, = Ne - Ne,, and since the norm of an integer is a rational integer, 
Ne = +1.) 

Conversely, if the constant term in the defining equation of an 
element of R[#] is +1, then the reciprocal of the element is also an 
element of R[#], and the element is a unit. 


54 ALGEBRAIC NUMBERS [cHaP. 2 


The units of an integral domain form a multiplicative group, since 
the product of units is a unit, 1 is a unit, and each unit e has an inverse 
€, such that ee, = 1. 


In the domain of rational integers, the only units are +1; in the 
Gaussian domain R[z], the units are +1, +7. All these units are roots 
of unity, but in some domains there are units which are not roots of 
unity, and in fact do not have absolute value 1. This was pointed out 
in Chapter 8 of Volume I, but we can now go into details. 

Let d be a square-free rational integer, and consider the field 
R(V4d). As a basis for the field we can take 1, Vd, so that every 
element of R(/d) can be uniquely expressed in the form a + bv, 
where a and b are in R. If b = 0, then a + bVd is an integer if and 
only ifaisin Z. If b ¥ 0, the defining equation of a + bVd is 


(2 —~a — bVd)(x —a + bVd) = x? — 2Qax + a? — db? = 0, 


so that ifa + bVdisinR [Vd], both 2a and a? — db? must be rational 
integers. Hence (2a)? — 4(a? — db”) = 4db? is also in Z; since d is 
square-free, it follows that 2b is in Z. 

Suppose that a = k + 4, withkin Z. Then 


0 = 4a? — 4db? = 4k? + 4k + 1 — 4db? = 1 — 4db? (mod 4), 
and it follows that 2b = 1 (mod 2), and d = 1 (mod 4). Conversely, 
if a and b are halves of odd integers and d = 1 (mod 4), the defining 
equation of a + bVd has coefficients in Z. Hence 1 and (1 + /d)/2 
form a basis of R[-/d], if d = 1 (mod 4). 

If d = 2 or 3 (mod 4), then a must be a rational integer. If b were 
of the form k + 4, with k in Z, we should have 
= 4q” — 4db? = — (4k? + 4k + 1)d = —d (mod 4), 


and d would not be square-free. Hence in this case both a and b 
must be in Z, and 1, Vd form a basis of R[-Vd]. 


THEOREM 2-16. Let d be a square-free rational integer. Then if 
d = 1 (mod 4), the elements of R[-/d] are either of the form 


a + bV4d, a and bin Z, (8) 


a+ bvd 


5 aandbinZ, a=b=1 (mod 2), 


2-4] UNITS AND PRIMES IN R[d] 55 
and the discriminant of R(Vd) ts 


1 3(1— V4) 


If d = 2 or 3 (mod 4), all the elements of R[-V d] are of the form (8), 
and the discriminant of R(V4d) 1s 


1 wd 
1 —v4d 


The units of R[-Vd] are the integers « for which Ne = +1. If d =2 
or 3 (mod 4), then e is of the form (8), so that the units are given by 
the solutions of the Pell equations 


az? — dy? = +1. (9) 


1 40+ V4) _ (Va? = d. 


A= — (—2V/d)? = 4d. 


If d = 1 (mod 4), the units are the integers of the form (4 + yV. d) /2; 
where z + yVd is a solution of one of the Pell equations 


g? — dy? = +4. (10) 


If d < 0, these Pell equations have only trivial solutions: (9) has 
solutions +1, 0 in all cases, and 0, +1 if d = —1, while (10) has the 
solution +2, 0 always, and +1, +1lifd = —3. If d> 0, equations 
(9) and (10) have infinitely many solutions.* 

Returning to the general domain R[¥], we say that an element z is 
prime if it is not a unit and has no factors other than its associates and 
units. 


THEOREM 2-17. Every nonuni element of R[é] can be writien as a 
finite product of primes. 


Proof: If ain R[¥] is not a unit, |Ne| > 1. Ifa is prime, we have 
the trivial representation a =a. If not, there is a factorization 
a = By into nonunits, and Na = Ng- Ny, where 


1 < |N6| < |Nal, 1 < |Ny| < |Nal. 


If either 8 or y is not prime, it may be factored. The process must 
terminate, since the rational integer Na has only finitely many 
divisors of absolute value greater than 1. 


*This result is given in Chapter 8, Volume I. The solutions for given d 
can be found explicitly with the aid of Theorem 9-6 of that volume. 


56 ALGEBRAIC NUMBERS [cHap. 2 


To see that this factorization need not be unique, consider the two 
representations 


21=3-7= (44+ V—5)(4— V—5) 


of 21in R[W/—5]. Since —5 = 1 (mod 4), the integers of this domain 
are a + bv —5, with a and b in Z, and the units are +1. It is clear 
that no two of the numbers 3, 7, 4 + “—5,4 — V —5 are associates, 


and we can also show that all of them are primes in R[V —5). Sup- 
pose that 


(a, + b,/ —5) (ae + bev —5) = 3. 
Then 


N (a; + b:V —5)N (ap + boV—5) = NB = 9, 
so that if neither factor is a unit, it must be that 
N(a; + bkV—5) = ay? + 5b,? = 3. (11) 


This equation, however, has no solutionin Z. By a similar argument, 
7 has no proper divisors, since the equation 


ay” + 5b,? = 7 (12) 


has no solution in Z. Finally, an assumed factorization of either 
44+ V—5 leads to the equation 


N(a; + b:V —5) -N(ap + bo VW —- 5) = 21, 


which in turn requires that either (11) or (12) hold. Hence R[V —5] 
is not a unique factorization domain. 

A domain R[3] is called a Fuclidean domain if for any pair of integers 
8 ~ 0 and a of R{¥], there is an element y such that 


IN (a — By)| < [N6l. 


In this case, there is a Euclidean algorithm by means of which a 
greatest common divisor can be defined, such that if (a, B) = 4, 
there are integers y; and v2 in R[d] for which ay; + By2 = 6. It is 
this last property which is essential for unique factorization, since 
from it we get the result, equivalent to the Unique Factorization 
Theorem, that if Blay and (8, a) = 1 then Bly. For if ye + v2 = 1, 


then yio7 + yoy = 7; hence Bly. There is no such ecp in R[V —5]. 


9-5] IDEALS 57 


For example, 3 and 4+ “—5 must be considered as relatively 
prime, since they are nonassociated primes, but if we had 


8(a +bV—5) + (44+ V—S)(c +dV—5) =1, a,b,¢,dinZ 
it would follow that 
3a + 4c — 5d = 1, 3b + c+ 4d = 0. 
Subtracting the second equation from the first, we would have 
3(a —b+c — 3d) = 1, 


which is palpably false. 

Every Euclidean domain, then, is a unique factorization domain, 
although the converse is not true. The quadratic Euclidean domains 
are completely known: R[+/d] is Euclidean if and only if d has one of 
the 21 values —11, —7, —3, —2, —1, 2, 3, 5, 6, 7, 11, 13, 17, 19, 21, 
29, 38, 37, 41, 57, or 73. 


PROBLEMS 


1. Show that Rip], where p = (—1 + iV’3)/2is a cube root of unity, is a 
Euclidean domain. [Compare Theorem 7-6, Volume I.] 

2. Find the ecp of 2-+ p and 5+ 7p in R[p]. 

3. Show that if d is square-free, and if A is the discriminant of R(V/4a), 
then the numbers 1 and (A + V/A)/2 form a basis of R[-Vp]. 


2-5 Ideals. One way of restoring unique factorization consists in 
enlarging the set of possible divisors; we might for example try to 


find entities A, B, C, and D of R[-/ —5] which are in some sense prime, 
and such that 
3 = AB, 7=CD:; 4+V—-5= AC, 4—~vV-—-5 = BD. 


Then the two representations of 21 in R[V —5] would no longer differ 
essentially; instead we would have 


21 = (AB)(CD) = (AC)(BD) = ABCD. 


To accomplish this without going outside the domain, we make a shift 
of emphasis; rather than asking for the divisors of a given number, 
we look for all the numbers which have a given divisor. Here two 


58 ALGEBRAIC NUMBERS [cuaP. 2 


properties of the divisibility concept, in which the divisor is fixed, 
come to mind: 

(a) If y\a, then y\aA for every integer X. 

(b) If y|a and y|8, then yla + 8. 

In other words, the set of multiples of y forms an additive group which 
is closed under multiplication by elements of the domain (but not 
necessarily in the set). If a|@, then the set of multiples of a contains 
the set of multiples of 8. The ccp (if there is one) of a and 6 has as 
multiples the set of numbers of the form a’ + 6’, where a’ and #6’ run 
independently over the multiples of a and 6 respectively, and this set 
is again an additive group closed under multiplication by elements 
of the domain. 

Because of the repeated occurrence of this special kind of set, we 
give the name ideal to any subset (containing at least one element 
besides zero) of an integral domain R[J] which forms a group under 
addition and is closed under multiplication by elements of the domain. 
Since there is no reason to suppose that every ideal of R[P] consists of 
all the multiples of a single element of R[8], we shall designate a 
general ideal by a capital letter. A principal ideal, consisting of all 
multiples of a given element a of the domain, will be designated by [a]. 
(It will be clear from the context whether the brackets designate an 
ideal or the greatest-integer function.) But instead of a single number 
a, we could begin with any finite set a,,..., am of elements of R[d], 
and form all expressions 


N11 + Agag +--+ Amam, 


where \3,-..,Am run independently over R[d]; the set of such 
expressions again forms an ideal, which will be designated by 
[ay,...,Qm]. (The numbers aj, ..., am are called generators of the 


ideal [a ;,...,@m].) This notation is similar to that for the ccp, if 
such exists, except that instead of writing (a, 8) = y we would now 
write [a, 8] = [y]. (Two ideals are said to be equal if they consist of 
the same numbers.) It will be shown later that A[{#] is a unique 
factorization domain if and only if every ideal of R[é] is a principal 
ideal. This should not be surprising, since this latter condition 
simply requires that any two elements of R should have a Gcp in 
R{d] which can be expressed as a linear combination of the elements. 


THEorEM 2-18. If R(@&) is of degree n, and A is an ideal of RiVI, 
then there exist elements a1, ..., @n of R[d] such that every element of 


2-5] IDEALS 59 
A can be uniquely represented in the form 
kyoy +--+ + knan, Ki in gli te: 
Remark: Note that the k’s are rational integers, and not elements 


of R[d]. The numbers aj,...,a, of the theorem are called a basis 


of A; they may be taken as a set of generators of A, but may not be 
the smallest such set. 


Proof: Tf the polynomial defining an element a ~ 0 of A is p(z), 
then for some h, the zeros of p”(x) are the field conjugates of a, so that 
p(x) =a” + az" 14+---4 Na, 
and Na = +(e”! + aya” * +---)a isin A. Hence A contains a 
rational integer different from zero, and therefore a smallest positive 
integer, say a. If p1,..., pz is an integral basis of R[8], then A con- 


tains ap; for each 7. Let a,; be the smallest positive rational integer 
such that the number 


Qy = 411P1 


isin A. Since A contains a 1p; and aps, it contains numbers which 
are linear combinations of p; and pe with coefficients in Z. Of these 
there is one (not necessarily unique) for which the coefficient of pe is 
positive and minimal. Let it be 


Ag = Ae1p1 + Aeepo. 
Similarly, for vy = 3,...,, put 
Ay = Ay1P1 + Ay2P2 + ey + AyyPy, 


where a,; is in Z for 1 <2 < vy and 4a,, is positive and minimal for a, 
in A. It is asserted that a,,...,a, form a basis of A. 
Suppose that 


Q = C1p) + +++ + Crpa, Ci,---,C, in Z, 
isin A. Then so also is a — ca, for every cin Z. Since 


0 < Cn — Ann || < Gnax 


rn 


it follows from the minimality of a,, that in the representation of the 
number a — [¢n/dnnlan, the coefficient of p, is 0, so that 


ed ad Ce dy,...,@, in Z. 


Ann 


60 ALGEBRAIC NUMBERS [cHap. 2 


Repeating the argument, we find that 


C d eo | 
eS eer ee 


nn Qn—1,n—1 


€1,.-+,@n-2 in Z. 


After n steps, we have 


n dn 
a= [| an + [ana te + [2] an, 
ann Qn—1,n—-1 Q41 


the desired representation. 
If there were two representations of the same number, their 
difference would be a nontrivial representation of 0: 


kyay +°+++ + kno, = 0, ky? + +-++ ky? > 0. 
But then also 
kyay™ +-->+ha,™ =0, m=1,2,...,n, 
which implies that A(a,,...,an) = 0, contrary to the equation 
A(ay,..., Qn) = G417dg97 +++ dnnA(p1,..-, Pn) ¥ O. 
The proof is complete. 


From their definitions, it is clear that each coefficient a;; is positive 
and not larger than a, the smallest positive integer in A. We would 
like to show that bounds can also be put on the other coefficients a,;, 
l1<j<t<n. We have 


Q, = 41/1, 


Qe = oip1 + Aeepe, 


ag = A31P1 + Agepe + A33P3, (13) 
Qn = AniP1 + Anope + An3zp3 + °° * + AnnPn- 


THEOREM 2-19. Every ideal in R[8| has a basis ay,..., On, given 
by (13), in which the numbers a;; are rational integers with 


0 < aj; < aj; < a). 


Proof: It is clear that any system of numbers q,... , a;_1, 
a; — kaj, a1, ..., Gn, in which k is a rational integer and j + 2, is 


2-5] IDEALS 61 
also a basis of A. For if a isin A and 


= Kyoy + kgag +++ + kntn, ey. +) kn in Z, 
then 


a=kjay +-->+ (kj + kha; +--+ +h(a; — kaj) + +++ + knan. 


In the set of equations (13), subtract a suitable multiple of a,_; from 
dn, SO that the new coefficient of p,_; is non-negative but smaller than 
Qn—1,n—1. Then subtract a suitable multiple of a,_2, so that the new 
coefficient of p,_» is smaller than @n_2,n2; this does not disturb the 
coefficient of p,_1. Continuing the process, we come eventually toa 
basis element a, such that 0 < a;;’ <a; for i=1,...,n—1. 
Then we change a,_; by subtracting off suitable multiples of an_2, 
Qn—3,---, a1, etc. The result is a basis as described in the theorem. 


CoroLuaRy. A positive rational integer occurs in only finitely many 
ideals of R[#}. 


This follows immediately from the theorem, for if @ is in A, then 
a; <a, and there are only finitely many sets of coefficients a;,; 
satisfying the conditions of the theorem. 


The discriminant of the elements of a basis of an ideal is called the 
discriminant of the ideal; its value is independent of the choice of 
basis. For if a1,..., a, and ay’,..., a, are bases of A, then there 
are hy; in Z such that 


n 
/ 
on = 2, haar, k =1,...,n, 


and 
det |hzz| ¥ 0. 
Hence 
A(aj,...,@n) = (det |hyi|)?A(ar’,..., an’), 


so that the discriminants have the same sign and 
A(ay’,..., Qn’ )|A(ar, ... 5 &n)- 


By symmetry, 
A(ay,..., @n)|A(ar’,..., an), 


and the discriminants are equal. 


62 ALGEBRAIC NUMBERS [cHap. 2 


PROBLEMS 


1. Show that every ideal in Z is principal. 

2. Tf A = [pat bv/ d] is an ideal of R[IV dl], where p= is a rational prime 
and d is a square-free integer not of the form 4k + 1, show that p and 
a + (b — plb/p])Vd form a basis for A. 


2-6 The arithmetic of ideals. Ideals are special kinds of sets of 
elements. The emphasis so far has been on the elements comprising 
the sets. The whole power of the theory of ideals, however, lies in 
considering them not as collections of elements, but as entities in their 
own right, which can be combined according to certain operations. 

The first of these operations is multiplication. If A = [a1,..., a,] 
and B = [@;,..., 8s], then the product AB is the ideal 


[a161,..., a1B8s, a281,..., arBs). 


The product ideal does not depend on the representation chosen for 
A and B. To show this, let 4B = C, and suppose that also 


A= [ay’, a eh ay ar], B= [B1’, Hosiery By"). 


To keep matters straight, designate these last ideals by A’ and B’, 
even though they are equal to A and B. We must show that every 
element of C is also an element of A’B’ = C’, and conversely. 

First of all, a,’ is in A and 8,’ is in B, so that we can write 


ay! = Moy tes Aver, Bi = wiBr +--+ + webs. 
Hence the number 
a; By = DAguroexBr = SveraeBr 


isin C for 1 <i<7r’,1<j<-s’. Since C is an ideal, every linear 
combination of the numbers a,’8;’ is in C; thus C’ is a subset of C. 
Hence C = C’, by symmetry. 


THEOREM 2-20. If A is an ideal of R[{d), there exists an ideal B of 
R[8] such that AB is a principal ideal [a], where a is in Z. 


Remark: It is this theorem which is the crux of the whole matter. 
As indicated in the discussion at the beginning of Section 2-5, we are 
trying to enlarge the set of possible divisors of an integer by introduc- 
ing ideal elements. Given any such divisor, there should certainly be 
a second divisor whose product with the first is the original integer. 


2-6] THE ARITHMETIC OF IDEALS 63 


Since we have taken divisors as sets, we must identify the original 
integer with the set of all its multiples. It should be noted that all 
the associates of a given integer generate the same principal ideal. 


Proof: Suppose A = [a,,..., a,], and put 
f(a) = a + ape +--+ + a,27. 
By representing a, ..., a, aS polynomials in 3, and replacing @ in all 


the polynomials by do, 33, ... , 3, in turn, we get sets a,,..., a,”, 
where »y = 2,3,...,m. We define 


g(x) = II (a, + ao”) x +..-4t a, 47—*) 
py =2 


= Bi + Box + +--+ Box. 


The 6’s are symmetric polynomials, with rational integral coefficients, 
in all the conjugates of a1,...,a, except a,,...,a, themselves. 
Hence they are polynomials in a, ..., a, with coefficients in Z, and 
therefore are in R[¥]. It is asserted that the ideal B = [6,,..., Bs| 
satisfies the conditions of the theorem. 

Put 


f(x)g(x) = Tt oT Yor +--°'+ Anica a. 


Since each y is a symmetric polynomial, with rational integral 
coefhcients, in each a; and its conjugates, the y’s are themselves 
rational integers. Let their cp bea. Then a can be represented as a 


linear combination of 71,..., Yr4s—1, with coefficients in Z; since 
Y1)+++ ) Yr¢s—1 are obviously in AB, ais in AB, and so [a] is a subset 
of AB. 


If we knew that a divides every product a,8;, then we would know 
that every element of AB is contained in [a]. The proof will therefore 
be complete when we prove Theorem 2-21, which is A. Hurwitz’ 
extension of a theorem due to Gauss. 


THEOREM 2-21. Let 
A(&) = apt" +--+ +o, B(x) = Bot? +--+ + Bs, 


where aoBy ¥ 0, be polynomials with integral algebraic coefficients. 
If an algebraic integer 6 divides every coefficient of 


C(x) = A(a)B(x) = cox’ +--+, 


64 ALGEBRAIC NUMBERS [cHaP. 2 


an the sense that each quotient c;/5 is an algebraic integer, then 5 also 
divides every product a, - Bi. 


Proof: First we prove a lemma: if 
f(x) = dpa” +--+ + 6,, do ¥ 0, 


ws any polynomial with integral algebraic coefficients and a zero p, then 
f(x)/(a@ — p) has integral coefficients. The proof is by induction on wu. 
If uw = 1, then f(z) = 692 + 6,, and 
Se) _ dor + hr 
t—p 2+ 8;/% 
is an integer. Suppose the lemma true for all polynomials of degree 
less than u. Then the polynomial 
Q(x) = f(x) — dox* (x — 1p) 


has integral algebraic coefficients (by the second part of Theo- 
rem 2-10), and has degree less than w and vanishes for x = p. By 
the induction hypothesis, 


Q@) _ f@) ws 
X—p «X-—p ° 


0 


has integral algebraic coefficients, and the same is therefore true of 
f(z)/(a — p). The lemma follows by the induction principle. 
By repeated application of the lemma, we deduce that if f(z) = 
do(z — pr) --- (@ — py), then any product dop, -- - p, 1S an integer. 
Returning to Theorem 2-21, suppose that 


A(x) = a(x — py) +++ (& — pr), 
B(x) = Bo(x — 01) -- + (a — ag). 
By assumption, the polynomial 


C(z) _ 208 


5 5 (x — pi) +++ (@ — as) 
has integral coefficients, and it follows that any product 


aoBo lion <n <ee' on <r, 


ae aaa L<m <m<-+-<m<s, (14) 


is an integer. Since a;/a 9 and 87/8 are elementary symmetric func- 


2-6] THE ARITHMETIC OF IDEALS 65 
tions in the p’s and o’s, respectively, the number 


anB, aoBo ap Bi 


is a sum of terms of the form (14), and is therefore an integer. The 
proof is complete. 


THEOREM 2-22. If AC = BC, then A = B. 
Remark: Note that there is no zero ideal. 


Proof: Let D be an ideal such that CD = [el], a principal ideal. 
Then ACD = BCD, so Afe] = Ble]. Thus e times any element of A 
is equal to e times some element of B, and so A = B. 

If A = BC, then we say that C divides A, and write C|A. 


THEOREM 2-23. AJC af and only if every element of C is in A. 


Proof: If A = [a1,...,a,]and B = [6,,..., Bs], then AB = C = 
[..., a@8;,...], so every element of C is in A, and also in B. 

Conversely, suppose that every element of C isin A. Then every 
element of CD is in AD, for every D. Choose D so that AD = [e] is 
principal, and let CD = [o1,..., 0]. Then for each7z with 1 <7 <4, 
o; = ed; for a suitable integer 4;. Hence CD = [e][\i,..., ri] = 
AD[M, .-., Az], and by Theorem 2-22, C = A[\u,..., 2], so that 
AIC. 


THEOREM 2-24. An ideal is divisible by only a finite number of 
ideals. 


Proof: If the ideal is A, choose B so that AB = [c], where c is a 
positive integer. Then cis in A and in every divisor of A, and by the 
corollary to Theorem 2-19, there are only finitely many such ideals. 


A common divisor of A and B which is divisible by every common 
divisor is called a greatest common divisor (GcD) of A and B. 


THEOREM 2-25. Every pair of ideals A and B has a unique ccnp, 
(A, B). It is composed of the numbers a + B, where a runs over A 
and 8 over B. 


Proof: Let D be the set described in the theorem; it is clearly an 
ideal. Since 0 isin A and B, D contains every element of A and of B, 
and so is a divisor of A and of B. Any common divisor of A and B 


66 ALGEBRAIC NUMBERS [cuap. 2 


contains all the elements of A and of B, and since it is closed under 
addition, it contains all numbers a + £, and so divides D. 

If D’ is also a ccp of A and B, then D and D’ are divisors of each 
other, and so each contains the other. Thus D = D’. 


If the acp of A and B is [1], we say that A and B are relatively prime. 
As an immediate consequence of this definition and Theorem 2-25, 
we have 


THEOREM 2-26. If A and B are relatively prime, there exist ain A 
and Bin B such thata + B = 1. 


THEOREM 2-27. If A|BC and A is prime to B, it divides C. 


Proof: Choose a in A and 6 in B so thata+B=1. Then if 


y is in C, ay + By = 7, and By and ay are in A, so that y is in A. 
Hence A|C. 


If A has no divisors except itself and [1], then A is said to be prime. 


THEOREM 2-28. Every ideal can be represented as a finite product of 
prime ideals, and the representation is unique except for the order of 
factors. 


The finiteness of the representation follows from Theorem 2-24, and 
the uniqueness from Theorem 2-27. 

In particular, it follows that the principal ideal generated by any 
element of R[%] has a unique factorization into prime ideals of Rfd]. 
If these prime factors are themselves always principal ideals, we might 
expect that ideals can be dispensed with entirely, and that there is 
then unique factorization of the numbers themselves. 


THEOREM 2-29. <A necessary and sufficient condition that R{3] be a 
unique factorization domain is that every ideal of R{8| be a principal 
adeal. 


Proof: Uniqueness of factorization in R[¥] is equivalent to the 
property : 


if al|By and a and B are relatively prime, then aly. (15) 


For if the domain has this property, unique factorization can be proved 
in the usual way, while if factorization is unique and a|@7, then every 
prime 7 dividing a must occur in the factorization of By; since this 


2-7] CONGRUENCES. THE NORM OF AN IDEAL 67 


factorization is the product of the factorizations of 6 and y, if does 
not occur in 6 it must occur in ¥. 

Suppose that factorization is unique in R[¥], so that (15) holds. 
Then if z is a prime number, [z] is a prime ideal. For if [x] = AB, 
where neither A nor B is [z], there would exist an ain A anda@in B, 
neither of which is divisible by z, while their product is. 

Let P be any prime ideal, and a = 7"... 7,”" any element of P. 
Then 


lo] = [wy]”. . . [wr], 


and since a is in P, so is every element of [a], whence P|[a] and P is 
one of the principal ideals [z;,]. Since every prime ideal is principal, 
every ideal is principal. 

Now suppose that every ideal in R[d] is principal, and that a and 8 
are relatively prime. Then [a, 6] = [y], for some y, and every linear 
combination \a + uf is a multiple of y. Taking \ = 1 and p = 0, 
we have ya; for\A = 0 and » = 1, we obtain y|6. Hence ¥ is a unit, 
[«, 8] = [1], and we can take y = 1. Thus there are \ and uw such 
that Aa + v8 = 1, so that if al@y, then a divides day + uBy = 7, 
and (15) holds. Hence factorization is unique. 


PROBLEMS 


1. Using Theorem 2-21, reformulate and prove the new version of 
Kisenstein’s irreducibility criterion, as given in Problem 1, Section 2-2. 

2. Show that if A = [a; + brV/d, az + boVd] is an ideal of R[Vd], then 
the product of A with its conjugate ideal A’ = [ai — bi d, ae — bev d] is 
principal. 


2-7 Congruences. The norm of an ideal. Two elements a and 8 
of R{P] will be said to be congruent modulo an ideal A if their difference 
lies in A, that is, if A divides the ideal [a — 8]. This is a natural 
extension of the earlier notion of congruence of rational integers, if 
the modulus m is identified with the principal ideal [m]. The familiar 
properties of congruences are easily seen to hold. 

For fixed a,-the set of all elements of R[#] which are congruent to a 
modulo A is called a residue class modulo A. 


THEOREM 2-30. There are only finitely many residue classes 
modulo A. 


68 ALGEBRAIC NUMBERS [cuHap. 2 


Proof: Choose B so that AB = [c], where c is a rational integer. 
Then a; ¥# ag (mod A) implies that a; # ag (mod [c]), since A|[c] 
and therefore A contains [c]. So if we can show that there are only 
finitely many elements, no two of which are congruent modulo [cl], 
the theorem follows. But this is an immediate consequence of the 
fact that in the basis representation 


a= 710, +--+ + Tron, 
where w1,..., ®n form an integral basis of R[#], each of the rational 
integral coefficients 7,,...,7n has only c possible values modulo c, 
and that if 
r; =r; (modc), 4=1,...,N, 
then 


ry, Hee + pwn = 7y'o1 + +++ + 7n wn (mod [c]). 


The number of residue classes modulo A is called the norm of A, 
written NA. For the time being, it is necessary to distinguish between 
Na and N{aj, the norms of the number a and the ideal [a], respec- 
tively. However, we shall soon see that the two quantities are essen- 
tially the same. 


THEOREM 2-31. If R(&) has discriminant A, and A is an ideal of 
R[8] having discriminant A(A), then 


A(A) = (NA)?A. 
Proof: Let a1, ---, a, be the basis of A described in Theorem 2-19, 
and let p1,---, pn bea basis of R[J]. Then 
A(A) = (Qu1 ais Onn)” A, 


and we must show that NA = aj, ---@nn; that is, that there are 
@11 °° * Gnn numbers of R[8], no two of which are congruent modulo A 
and such that every element of R[#] is congruent to one of them. We 
show that this is true of the numbers 


rip, b+" 1 TnPn, 


where 0 < r, < ax, fork = 1,...,n. If two of these numbers are 
congruent, say 


1101 ++*+ + fabn =71 1 +++ + 4x’ pn (mod A), 
and r, > 7, , then 
(71 a ry )p1 = Be ei (Tr = Tr )Pn = 0 (mod A). 


2-7] CONGRUENCES. THE NORM OF AN IDEAL 69 


But an, is the smallest positive rational integer for which any number 
of the form 


Syp1 + °° + Sp_1Pn—1 + AnnPn 


is in A; since 0 <r, — rp’ < dan, it follows that r, — rn, = 0. 


Similarly, Tn_1 = Tr1,---,71 = 11. 
If 
B = 81p} + +++ + Srna, 
then 
B — |= | an = 81'p1 +--°+ Sn—1P 2-1 + DrPn; 
ann 
where 0 < bn < Qnn. By iteration, 
Sn Sp-4 
B — | =| on - || ans ee Cee, oe bipy + °° + + Onpn, 
ann An—1,n—1 


where 0 < by < ax, fork = 1,...,n, and 


B = bip: + +++ + brpn (mod A). 
Corotuary. Nfa] = |Nal. 


For ap, ..., @P»y is clearly a basis for [a], and 
A(api,.--, Pn) = (Na)*A, 


so that (Na)? = (N[a])*. But Na], being the number of residue 
classes, 1s positive. 


THEOREM 2-32. If A and B are ideals, then there is an ain A 
such that ({a], AB) = A. 


If such an a exists, then clearly [a] = AC, where (B,C) = [1]. 
If we rephrase the theorem, its close relation to Theorem 2-20 


becomes clear: given two ideals A and B, there is a C such that AC is 
principal and (B, C) = [1]. 


Proof: Let P;,..., P, be the distinct primes dividing AB, and let 
A = P,%..- P,*, e; > 0. 
Put 
D; = II Per. t= 1, knagt: 


1<j<r 
j He 


70 ALGEBRAIC NUMBERS ([cHar. 2 


Since (D,,..., D,) = [1], there are numbers 6; in D;, for? = 1,...,7, 
such that 


teeta = 1. 
Then [6,] is divisible by D;, and therefore by P; for k # 7, and there- 


fore not by P;, since 1 isnot. Now let a; be an element of P,** which 
does not occur in P,f*t!, fori = 1,..., 7, and put 


a = a6; +°--> + 4,6,. 


Then for each 7, every term but one in this representation of a occurs 
in P,f*t!, while the remaining term occurs in P,f* but not in Pf**. 


Hence A|[a], but 
a] \ | 
(F .B) = [1]. 


THEOREM 2-33. The congruence 

at = B (mod A) 
is solvable if and only if D\[8], where D = ([a], A). The solution, if 
at exists, 1s unique modulo A/D. 


Proof: If £is a solution, then at — 6 = y isin A, and therefore in 
D. Since also aé is in D, it follows that 6 is in D, so D|[6]. 

If 6 is in D, then it is the sum of an element of [a] and an element of 
A; that is, 8 = af + 6. Since 6 = 0 (mod A), 8 = aé (mod A). 

If af = at’ = B (mod A), then a(é — ’) = 0 (mod A). Hence if 
[a] = DA, and A = DAag, then (Aj, Ag) = [1] and 


DA2\DA,[E — €'), 
AslAilé — £'], 
£ = £’ (mod Ag). 
THEOREM 2-34. N(AB) = NA-NB. 
Proof: By Theorem 2-32, there is a y such that 
([y], AB) = A. 


Let NA = m, NB = no, and let a1, ..., an, and fi, .-.., Br, be com- 
plete residue systems modulo A and B, respectively. We shall show 


that the njn2 numbers a; + 78; form a complete residue system 
modulo AB. 


2-7) CONGRUENCES. THE NORM OF AN IDEAL 71 
If 


a; + vB; = ax, + v8: (mod AB), 
then 


71(8; — Bi) = a, — a (mod AB), 


and by Theorem 2-33, ([y], AB)|[a, — a], so that Alfa, — a,]. But 
this gives a; = a; (mod A), sok = 7. Moreover, 


7 (6; — 61) = 0 (mod AB), 
8; — B; = 0 (mod B), 
j=l. 
To show that every integer 6 is congruent to one of the above 
numbers, choose a; so that 6 = a; (mod A). Then the congruence 
vé = 6 — a; (mod AB) 


is solvable, since ([y], AB) = A is a divisor of [6 — a,]. Finally, & is 
unique modulo B, and can therefore be taken to be one of the num- 
bers 6 iE 


THEOREM 2-35. NA 7s an element of A. 


Proof: If a,,..., ay4 is a complete residue system modulo A, then 
soisa,; +1,...,ay4 +1. Hence 


a +++ + ans, = (1 +1) +--- + (ena + 1) (mod A), 
0 = NA (mod A). 


CoroLuary. There are only finitely many ideals of given norm. 


For by the corollary to Theorem 2-19, a positive rational integer 
occurs in only finitely many ideals. 


PROBLEMS 
1. Show that if P is a prime ideal of R[é], the congruence 
2” tay” 1+---+an, = 0 (mod P) 


with coefficients in R[J] has at most m incongruent solutions modulo P. 


2. Show that if P is a prime ideal of R{[#], a is an element of Rf], and 
Ptia], then 


aNP-! = 1 (mod P). 


72 ALGEBRAIC NUMBERS [cHap. 2 


2-8 Prime ideals 
THEOREM 2-36. If NA is prime, so is A. 
This follows immediately from Theorem 2-34. 


THEoreM 2-37. There are infinitely many prime ideals P in any 
domain R{3]. Each such P divides exactly one rational prime p, and 
NP = p’, where f, called the degree of P, is a positive integer not 
exceeding the degree of R(#). 


Proof: Let p be a rational prime, and let P be one of the factors of 
[p] in R[8]. Then if P also divided the ideal defined by another 
rational prime p’, it would divide their cp, which is [1]. Hence each 
P divides at most one p, and each of the infinitely many rational 
primes p is divisible by at least one P, so that there must be infinitely 
many P’s. 

Now let a be a rational integer such that P|[a]; by Theorem 2-35, 
we could takea = NP. If a = p, ---p,, then 


P\[pi} - - - [pr], 


and so P|[{p;] for some 7. 


Finally, if P|[p] then [p] = PA for some A. By the corollary to 
Theorem 2-31, 


N[p] = |Np| = p”, 


and so N(PA) = NP-NA = p”. Hence NP\|p”, and the proof is 
complete. 


Theorem 2-37 shows that the primes of R[9| are to be found among 
the factors of the principal ideals [p]. Only partial information is 
available about the way these ideals decompose, and the derivation of 
most of what is known is too intricate for inclusion here, but we can 
prove the simpler half of a famous theorem due to Dedekind, which 
states that [p] zs divisible by the square of a prime ideal in R[d| af and 
only if p divides A, the discriminant of R(@). 


THEOREM 2-38. If p does not divide A, then [p] factors as a product 
of (one or more) distinct prime ideals. 


Proof: Suppose that P?|[p], so that [p] = P?M. Choose an element 
a of PM which does not belong to P?M, so that pla” but pla. Since 
p = 2, p|(aB)? for every 6 in Rfd). 


2-8] PRIME IDEALS 73 


For an arbitrary element y of R(#), define Sy, the trace of y, by 
the equation 


Sy=7 +---+7™, 


where 7’,..., y‘™ are the field conjugates of y. By the Symmetric 
Function Theorem, Sy is in Z if y is an integer, and it is clear that 
S(ry) = rSy7 if r is rational. In particular, 


(a8)? _ Slab)? 
P P 


is in Z, so that S(a@)? is in [p]. By the multinomial theorem, if 
p’,...,8™ are the field conjugates of 8, then 


(S(ap))? = (a’ B’ 2 a Br) \P 
= (a/B’)? +--+ + (a™g™)? = §((aB)?) 
= 0 (mod p), 


and since S(a@) is a rational integer, p|S (a8). 
Now let 1, ..., pn be an integral basis for R[v]. Then 


a = hyp) +-++ + hnbn, 


where the h’s are rational integers not all divisible by p, since pta. 
For 1 <7 < 7 we have 


S(ap:) = s( 5 hse) = Xu h;S(p:p;). 
j= j= 
Let 
d = det |S(.0;)|, 


and let A;; be the cofactor of S(p,;e;) ind. Then fork = 1,2,...,%, 


Pm Aix Xu hjS(eip3) = mi h; u AinS(pip;) = dhe. 
i= j= j= i= 
Since 

ape h;S (0:03) 

J 
for each 7, it is also true that p|dh, for each k; p therefore divides d. 
Finally, 
d = det |S(p:p;)| = det |X p:p;| = det |p|? = A; 
ie 

hence p|A. 


74 ALGEBRAIC NUMBERS [cHap. 2 


As an illustration of the present theorem, note that in the field 
R(z), of discriminant —4, we have 


[2] = [1 + 7)’, 
[p] = [a + bi][a — bil, if a? + b? = p = 1 (mod 4), 
(q| = P, a prime ideal, if g = 3 (mod 4). 


Here [1 + 2], [a + 62], and [a — bz] are prime ideals of degree 1, while 
each P is of degree 2.* 


THEOREM 2-39. Each ideal [p], where p zs a rational prime, splits 
into at most n ideal factors in the integral domain of a field of degree n. 
Proof: If [p] = Pi ---P,, 

then p” = N[p| = NP, --- NP,, 

and for each 7, NP; > 1. Hencer < n. 


PROBLEMS 

1. In the domain R[V —5], put 

A=([3,4+V-5, B=[3,4-V-5, C=[7,4+—7-5], 
D = (7,4 —-V —5]. 


Show that AB = [3], CD = [7], AC = [4+ V —5], BD = [4 — v —5], 
and that A, B, C, and D are prime ideals. Factor {1 + av/ — 5]. 


2. Let R(V 4), where d is square-free, have discriminant A. If gq is an 
odd prime dividing A, show that the ideals 


| A+ va] | A-vV | 

ater and (——— 
2 2 

are equal, and that their product is g. Show also that if A is even, then 

[2] = [2,\/d]? for d= 2 (mod 4) and that [2] = [2,1 +d]? if d=3 

(mod 4). This completes the proof of Dedekind’s theorem, stated just 

before Theorem 2-38, in the case of a quadratic field. 


2-9. Units of algebraic number fields. We saw in Section 2-4 that 


the units of a quadratic field R(-/d) are determined by the solutions 
of the Pell equation with N = +1 or +4, and it is an easy conse- 


*Compare the remark following Theorem 7-7, Volume I. 


2-9] UNITS OF ALGEBRAIC NUMBER FIELDS 75 


quence of this relationship and standard properties of Pell’s equation* 


that the group of units of R(~/d) has a basis, consisting of —1 and 
the fundamental solution e of the appropriate Pell equation. That is 
every unit of R(V 4d) can be written in the form (—1)e*, where a is 
0 or 1 and 6 ranges over Z. We shall now show that this property is 
not peculiar to quadratic fields, but that in fact the group of units in 
each algebraic number field has a finite basis. (In general, if Gis a 
commutative multiplicative group and b;,..., b,, are elements of G, 
they are said to form a basis for G if every element of G can be repre- 
sented in the form b,”1- - - b,,”™, and in every such representation of 
the unit element e of G, the factor b;"* = eforl<zt<m.) This 
theorem, which is due to Dirichlet, can be sharpened by giving 
the exact number of basis elements, but for many purposes, including 
the application to be made in the next chapter, the finiteness of the 
basis suffices. The upper bound which we shall obtain is actually the 
correct number. 

We introduce the symbol [a] to designate the maximum of the 
absolute values of the conjugates of the algebraic number a, and 
denote by K a fixed algebraic number field. 


THEOREM 2-40. If a is a fixed positive number, there are only 
finitely many integers a of K such that 


la] <a. 
If all conjugates of a have absolute value 1, then a ts a root of unity. 


Proof: If [a] <a and deg a = n, then each of the elementary 
symmetric functions in a and its conjugates is numerically smaller 
than some bound depending only ona and vn. If a@ is an integer in K, 
then n cannot exceed the degree of K, so that there are available only 
finitely many coefficients for the defining polynomial of a, and there 
are, therefore, only finitely many such a’s in K. 

If ja | = 1 fori = 1,...,n, then/a”] = 1 for all m in Z, so that 
by what we have just proved, a”! = a’”2 for some distinct exponents 
m, and m>. Hence a”! 2 = 1, so that a is a root of unity. 


THEOREM 2-41. The group U of roots of unity in K ts a finite cyclic 
group. 


*See, for example, Theorems 8-5, 8-6, and 8-7, Volume I. 


76 ALGEBRAIC NUMBERS [cHap. 2 


Proof: If £ is a root of unity, then|¢| = 1, and the finiteness of the 
group follows from the preceding theorem. Let the various elements 
u; of U be primitive w;-th roots of unity, forz = 1,...,¢, and put 


w = max (wi,..., Wz). 


For fixed 7, the numbers e?"*/* and e*"’” are in U, for every a and 
bin Z. If (w,;,w) = d, choose a and b so that aw + bw; = d; then 
the product 


eomla/w; +b/w) 2rid/w;w 


= €@ 


eet (wiw) 


isin U. It follows that the tcm of w; and w does not exceed w, so that 
w,;\w fort = 1,...,¢. Since the powers of 


include all dth roots of unity if d|w, it is clear that >) generates U. 


Now let 3, of degree n, be a primitive element of K, so that 
K = R(é), and arrange the conjugates of # in such an order that 
bY)... , 8 are real, while 9+), ...,9™ are not real. (Note 
that it is not necessarily true that 3 = 9.) Then n — r; is an even 
number, say 2re, and we can further order the nonreal conjugates so 
that 8 and ott) are complex-conjugate, for j = 1,..., re. 
If a is any number of K, the field conjugates of a are such that 
a), ...,a°™ are real, while a +t and a ++) are complex-con- 
jugate for 7 = 1,...,72. Of course some of these latter numbers 
may also be real, but in any case 


laSttD| = [qgirtrets| forj=1,...,79. (16) 


If ¢1,..., are units of K, they are said to be independent if the 
relation 


a e°e 


€] "€e ° = 1, Ay, --+ 5 A in Z, (17) 


holds only for a; = --- = a = 0. 


THEOREM 2-42. Untts e,..., in K are independent if and only 
af the sole solutzon of the system 


k 
ZX tmloglen®|=0,  6=1,2,.0.57 (18) 


in rational integers is 4 = -++=2,=0. Herer=7rj+ 7, —1. 


2-9] UNITS OF ALGEBRAIC NUMBER FIELDS V7 


Proof: Suppose that (17) has a solution in which not all the a’s 
are zero. Then the analogous equation with each e,, replaced by én” 
also holds, so that 


Olt [gO = 1, F=1,...,7, 
and 


k 
> am log jen) = 0, a¢=1,...,n. (19) 
m=1 


Conversely, if (19) holds with not all the rational integers a), ..., az 
equal to zero, then «7! - - - ¢, is an integer of K all of whose conju- 
gates have absolute value 1; it is therefore a root of unity whose wth 
power is 1, and (17) holds with a;,..., a, replaced by way, ..., wax. 
Hence the nontrivial solvability in Z of (19) is equivalent to the 
dependence of €,..., &.- 

The truth of the theorem will now follow if we can prove that if 
the equations (19) hold with 7=1,...,7r1 +7.—1, then the 
remaining n — 7; — rg + 1 equations are also correct. To show this, 
suppose that the first 71; + ro — 1 equations are true, and define 


alt forl <i<n, 
‘(2 forry +l <i<n. 


Since each e,, is a unit, its norm é,‘) - - - én‘ has absolute value 1; 
by (16), 
n ; T1112 ; 
log jen®| = >> e; log len | = 0. 
i=1 i=1 
Hence 
k ri +7 ; ry t+re k . 
DX am 2 & log lem™| = Do es L am log lem™| = 0, 
mal i=l i=1 mel 
so that 
k rytre—1 k 
Crtre* Lo Om log lent? | = — Ye: X am log len™| = 0. 
m=1 t=1 m=1 


Thus (19) also holds for 7 = 7; + re, and so, by (16), for 


t= 1,2) 000 5%. 
THEOREM 2-48. If the relation (18) holds for some set of real numbers 
11,...,X,% which are not all zero, it also holds for rational integers 


X1,-.., 2% which are not all zero. 


78 ALGEBRAIC NUMBERS (cHap. 2 


Proof: Suppose the hypothesis fulfilled. Since the system (18) is 
certainly nontrivially solvable in rational integers if some e,, is in U, 
it suffices to consider the case that all the units are of infinite order. 
Then each unit separately is independent. Now suppose that the 


units €,,..., €g are such that the equations 
g—l ; 
> am log lem‘? | = 0, = 1,e..;7, (20) 
m=1 


have the single real solution a, 


- = G1 = 0, while the system 


l 
i) 
“ 
©, 
| 
a 
Pa) 
~“ 


q e 
X em log lem | (21) 
m=l1 
has a nontrivial real solution a,,...,a,. Then2<q<k,a, ¥0, 
and the ratios a ;/aq,...,@g-1/a, are uniquely determined, since 
otherwise the differences of the respective ratios would provide a 
nontrivial solution of (20). If we can show that these ratios are 
rational numbers, the theorem will result by taking a suitable common 
integral multiple of the numbers 

= forl<m<q, 


0 forg<mc<k. 


If we put an,/ag = —Bm for m=1,...,q—1, equations (21) 
imply that 


° q—l ° 
log |ég°| = X Blog len], i= 41,...,m (22) 


Now consider the set of all units 7 with the property that 


e q—1 ° 
log In| = 2 ymlog|en|,  # = 1... 2, (28) 
n= 
for suitable real numbers 71,..., Yg_1. For such an 7 the coefficients 


Ym are unique. We call the set 71,..., yg-1 of real numbers proper 
if 7, aS defined in (23) with these y’s, actually is a unit, and if in 


addition |yi| <1,...,|ygal <1. If v,...,yq-1 is a proper set, 
then 


q—-1 
login || < X Hoglen 


and, by Theorem 2-40, there are only finitely many (say H) proper 


2-9] UNITS OF ALGEBRAIC NUMBER FIELDS 79 
sets. On the other hand, if y1,..., Y¢_1 1s proper, so also is 
Ny: — [Nvil,..., NY¥q-a — [Nve-1]; 


if N is a rational integer. For 
qi ° ° q—1 . 
X. (Nm — (Nm}) log len | = log [n|Y — Xo log fem |, 
m=1 m=1 


which is the logarithm of a product of powers of units, and is there- 
fore the logarithm of a unit. Now if any 6, were irrational, then no 
two of the numbers NB, — [NB], where N runs over Z, would be 
equal, and we should have infinitely many proper sets. This con- 
tradiction establishes the theorem. 


THEeoreM 2-44. If e,...,@ are units such that the only real 
solutzon of (18) is the trivial solution, then there ts a rational integer 
M with the following property: in order that a number n such that 


k 
log |n®| = >> Ym log len |, t=1,...,n, 
m=1 


be a unit of K, it ts necessary that all the numbers Mym be rational 
antegers. 


Proof: The hypothesis is that which was used in the preceding 
proof, except that we have replaced g—1 by k. Suppose that 
Ym = a/b, where a and b are rational integers with b>0 and (a, b) =1, 
and m is one of the integers 1,...,k. Then Nym — [Nm] assumes 
the b values 0/b, 1/b,..., (b — 1)/b, so that b < H, where H is the 
number of proper sets. Hence, b|H!, and we can take M = H!. 


THEOREM 2-45. The group E of all units of K has a finite basis, the 
number of basis elements of infinite order being at most r. 


Proof: The system (18) of r linear homogeneous equations in k 
unknowns is certainly nontrivially solvable in reals if k > r, and it 
follows from Theorem 2-43 that there are at most r independent 
units in K. Let k be the exact maximal number of independent units, 
and let e;,...,€, be such a set. Then by Theorem 2-44, for every 
unit 7 of K there are g;,..., 9; in Z such that 


k 
log In| = log len ®|, = 1... 0. 


m=1 


80 ALGEBRAIC NUMBERS [cHap. 2 


By the second part of Theorem 2—40, and Theorem 2-41, it follows 
that 


7“ ne €7! jie EF" t 9, 
so that €1,... , ez, {9 form a basis for the group of Mth powers of units. 
Now define the numbers 9, ... , & by the equations 
ty = fll, f= «1! teey Ey, = e,4/™, 


where an arbitrary but fixed Mth root is taken in each case. The 
numbers £,, may not lie in K, but they form a basis for a group Ey of 
complex numbers, and Ey clearly contains E as a subgroup. The 
theorem is therefore a consequence of the following general principle. 


THEorEM 2-46. If G ts a commutative group having a basis of n 
elements, every subgroup of G also has a basis, of at most n elements. 


Proof: Suppose that \1, ..., A, is a basis for G, that S is a subgroup 
of G, and that some ); actually occurs in the representation of some 
element s of S. Let J; be the set of all exponents which occur on ); 
in the representations of the various elements of S. If a is in J;, so 
is ka for k in Z, and if aand a’ are in I;, soisa — a’. Hence J; is an 
ideal in Z, and is therefore a principal ideal, say I; = [a,*]. 

We now proceed by induction on n. Ifn = 1, then \," is a basis 
for S, by what we have just proved. Suppose that the theorem is 
true for every commutative group with n — 1 basis elements, and 
suppose that G has n basis elements, say \1,.-.,n. Let S be a sub- 
group of G. If every element of S can be written in the form 


Ay! ee ) OO ata 


the result follows from the induction hypothesis. Otherwise, suppose 
that I, = [a], and let \ be an element of S in whose basis representa- 
tion A, occurs with exponent a. Then for every s in S there exists a 
b, in Z such that s\®* has a representation 


srPt = yl -- - Ny 21 


The set of numbers of the form s)°* is therefore a subgroup of the 
group G’ which has \i,..., An—1 aS @ basis, and by the induction 
hypothesis this subgroup also has a basis, of at most n — 1 elements. 
This latter basis, together with ), clearly constitutes a basis for S. 


REFERENCES 81 


REFERENCES 


Section 2-4 


The complete tabulation of Euclidean domains is the work of many 
writers. K. Inkeri (Annales Academiae Scientiarum Fennicae, Series A 
(Helsinki) I, Mathematics-Physics, 41, 35pp. (1947)) supplied the last link 
in a chain of theorems which together show that if d > 100, then & (VJ d) 
is not Euclidean. E.S. Barnes and H. P. F. Swinnerton-Dyer (Acta Mathe- 
matica (Stockholm) 87, 259-323 (1952)) showed that, contrary to what 
had been believed, R(1/97) is not Euclidean. P. Varnavides (Proceedings 
Konink. Nederlandsche Akademie van Wetenschappen, Series A (Amster- 
dam) 55, 111-122 (1952) or Indagationes Mathematicae (Amsterdam) 14, 
111-122 (1952)) showed that the values of d listed in the text yield 
Euclidean domains. 


Section 2-9 


The material of this section is adapted from E. Hecke, Vorlesungen iiber 
die Theorie der Algebraischen Zahlen, Leipzig: Akademische Verlags- 
gesellschaft m.b.H., 1923; reprinted by Chelsea Publishing Company, 
New York, 1948; pp. 116-131. It is proved there that the upper bound 
obtained in the text is exact. 


CHAPTER 3 


APPLICATIONS TO RATIONAL NUMBER THEORY 


3-1 Introduction. As was suggested in the preceding chapter, 
there are many problems in rational number theory which are most 
naturally treated in the more extensive framework of an algebraic 
number field. Chief among these are various Diophantine equations; 
indeed, it was the study of Fermat’s equation, x” + y” = 2", n = 8, 
which was originally responsible for the development of ideal theory. 
While this approach has not led to a complete verification of Fermat’s 
conjecture in all cases, it has produced results which would probably 
never have been obtained using rational methods alone. In the first 
part of this chapter we will discuss some results of this kind due to 
E. Kummer. Here heavy use will be made of ideal theory. 

The latter portion of the chapter is primarily concerned with a 
theorem due to B. Delauney and T. Nagell, which asserts that the 
cubic analog of Pell’s equation, 


® + dy® = 1, 


has at most one solution in nonzero rational integers z, y, and com- 
pletely characterizes this possible solution. (In the next chapter we 
shall prove a less precise result about the general equation +”+dy” = 1, 
n > 3.) Use is made here of the insolvability in Z of 


x3 + y? = 2°, 
but otherwise the two parts are mutually independent. 


3-2 Equivalence and class number. We say that the ideals A 
and B of R[é] are equivalent, and write A ~ B, if there are nonzero 
elements a and 6 of R{d] such that 


[aJA = [B)B. 


It is easily seen that ““~” is an equivalence relation. Moreover, if 
82 


3-2] EQUIVALENCE AND CLASS NUMBER 83 
A ~ B and C ~ D, then AC ~ BD, and if AC ~ BC then A ~ B. 


THEOREM 3-1. All principal ideals are equivalent. Any ideal 
equivalent to a principal ideal is principal. 


Proof: The first statement is trivial, since 
[o][6] = [6]{o]. 
If A ~ [a], then for some 8 and 7, 


[B]A = [ally] = [ey], 


and hence 


[lily], 
Bley, 
ay = Bo, 
[B]A = [ay] = [6)f6], 
A = [6]. 

Since equivalence is an equivalence relation, the ideals of R[#] can 
be separated into equivalence classes in the usual way. The number 
h of such classes is called the class number of the field; according to 
Theorem 3-1, h = 1 if and only if every ideal is principal, i.e., if and 


only if R[F] is a unique factorization domain. We shall now show that 
his always finite. 


THEOREM 3-2. There is a positive constant c, which depends only 
on the field, such that each zdeal A divides a principal ideal AB for 
which 


NAB < cNA. 
Proof: Let p1,..., pn be a field basis, and let p,“,..., p, 
(s = 1,...,m) be the field conjugates of these numbers. We shall 


show that the theorem is true with 
¢ = TT (lor) +++ + lon). 
Let A be an arbitrary ideal, and let k be the greatest rational 


integer not exceeding (NA)!/", so that k”» < NA < (k +1)”. Then 
if ¢},..., t, range independently over the integers 0,1,..., k, there 


84 APPLICATIONS TO RATIONAL NUMBER THEORY [cuap. 3 
are determined (k + 1)” different integers 
tip) + °-- + tnbn, 
and two of them must be congruent modulo A: 
Uypy + +++ + UnPn = 1101 + +++ + npn (mod A). 
Thus 
a = (Uy — 01)p1 +--+ + (Un — Un) Pn 
is in A, so that A|[a], and 


Nia] = [Nal = pu (= (uz — vs) 


n nN 
<II > lu; — a] - [o:| 
s=1 +=1 


<II > klo:“| = ck” < cNA. 
s=1 i=1 


THEOREM 3-3. The class number of any algebraic number field is 
jinite. 


Proof: It suffices to show that in each class there is an ideal B such 
that NB < c¢, by the corollary to Theorem 2-35. Let C be an arbi- 
trary ideal of a given class, and determine A so that AC is principal. 
Then by Theorem 3-2, there is an ideal B such that AB is principal 
and NAB < cNA. Then AB ~ AC, B~C, and 


Nee ey 
NA” 


THEOREM 3-4. If his the class number, the hth power of any ideal zs 
principal. 


Proof: If A;,..., Anis a complete system of representatives of the 
various classes, and A is arbitrary, then AA;,..., AA, is another 
such system. Hence 


A,::-An~ AA1:+: AA, = A’Ay-+- An, 
so A’ ~ [1] and A’ is principal. 


THEOREM 3-5. If p is a rational prime and pth, then A? ~ B? 
wmplies A ~ B. 


Proof: Since pth, there are positive x and y in Z such that 
px — hy = 1. 


3-3] THE CYCLOTOMIC FIELD K, 85 
From the fact that A? ~ B? we have 
[a]A” = [6]B?, 
[a]?A?* = [B)°B?”, 
[a]"7A"”A = [6]"B'B, 
and by Theorem 3-4, A ~ B. 


Theorem 3-5 shows that the primes which do not divide h enjoy a 
property not shared by other primes. This is of great importance in 
the investigation of Fermat’s equation. 


PROBLEMS 


1. Leta and be algebraic integers, not both zero. Show that there is an 
integer 6 such that, first, 5|a and 6|6 (in the sense that a/é6 and 6/6 are again 
integers), and, second, for suitable integers £ and 7, 6 = aé + Bn. Show that 
this Gcp is unique up to an algebraic unit (i.e., an integer which divides 1). 
[Hint: First settle the case a8 = 0. In the other case, let K be an alge- 
braic number field of class number h, containing both a and 6. Then 
[a, B]* = [y], for some y in K. Let 6 be an integer such that 5* = y, and 
show that the equation fa, B]* = [y] still holds when [a, 8] and [y] are 
interpreted as ideals in K(6). Deduce that [a, 8] = [6] in K(6).] Does the 
Unique Factorization Theorem hold in the domain of all algebraic integers? 

2. Let K be an algebraic number field. Show that to each ideal A of K 
there corresponds an integer a (not necessarily in K) such that the elements 
of A are exactly those integers of K which are divisible by a. 


3-3 The cyclotomic field K,. Let p be an odd prime, let 


x? —1 
@(rz) = 5 = gPl4 gp ?+...+1, 
and let ¢ = e*/?, so that the zeros of ® are {, {*,...,¢?7", the 


primitive pth roots of unity. The field R(¢) = R(f*?) =--- = 
R(g?') = Kp is called a cyclotomic field. It is clearly of degree 
p—latmost. Weputl—-{= 7. (The fact that the symbol z is 
used for two different numbers should occasion no confusion; the 
number x = 3.14159 ... will occur only in the argument of the ex- 
ponential function. ) 


THEorEM 3-6. In Ko, the ideal [p] has the factorization 
[p] = [x]? *; 


86 APPLICATIONS TO RATIONAL NUMBER THEORY [cuap. 3 


[x] «s prime, and N[x] = p; ® is irreducible, and K, is of degree 
p— 1. 


Proof: Since ¢ is an integer of Kp, so is 


a ea tea ee l<r<p-l; 
if now an r’ is chosen so that rr’ = 1 (mod p), then 7” = §, so that 
1 — id — 
= g =J+e74+.-- 4 ere 1) 


1-2?’ 
is also an integer. Hence e, is a unit of K,, and 
p-l p—l 
p= &(1) = td —-)=(-9)?"* Il ¢ = el — 5)? ", 


where ¢ is a unit. It follows from this equation that [p] = [x]?"?, 
and also that Nx = p. By Theorem 2-39, deg K, > p — 1, so that 
[x] is prime, deg K, = p — 1, and @ is irreducible. (For a different 
proof of the irreducibility of 6, see Problem 1, Section 2-2.) 


Hereafter, we designate [z] by P. 
THEOREM 3-7. Writing A(1, ¢,...,¢?~?) = A(¢), we have 
A(s) = (-1) 9 Pp, 


Proof: From the representations 


p-1 Pox. 1 
0)-T@-r)=— 
we obtain 
Poe +: ; ee De 1)psP—-) — (gps — J) 
® (¢ ) 1srgp-t ¢ ¢ ) = (x? = 1)? 
pr p 
eal ede) 
Since 
1 ¢ ak Oe : 
Lae ede OO 
AG) =|: : = JI ot -), 
: Il<r<s<p-l 


1 ppl or ¢(Pp—2)(P—1) 


3-3] THE CYCLOTOMIC FIELD K, 87 
we have 


Pes | 
ays (iP Me TO Or 
s=1 okie 


= (le? TT #'@) = (-e 
s=l 
= (— 1)2¢ Dp? 


C1)? ph 


NtNr 


TaroreM 3-8. The numbers 1, ¢,...,£?~? form an integral basis 
for Ky, so that 


A = A(t) = (—1)?@ “Pp? 
Proof: Suppose that a is an integer of K,, and that 
a=rotrnigt-e-++ rp of? , 


where the r’s are rational. Then fork = 0,1,...,p — 1, 
p—2 ; 
ate = ~ rn, 
j=0 
and since the trace function is clearly additive, 
p-—2 ' p—2 ; 
S(¢*a) = m Sri?) = X riS"). 
I= Yh 
Solving this system of equations for the numbers 7;, we obtain 
Sexe determinant in a and ¢ 
, det |S(¢7s*)| 


But as we saw in the proof of Theorem 2-38, det |S(¢’¢*)| = A(t); 
since the determinant in the numerator has the rational value r;A(¢), 


and is clearly an integer of K,, it is a rational integer. Thus a can be 
written in the form 


Oo teagte::+ep of?" do tdir+---+dp on? 
= nT 2 = i | 2 : 
where the c’s, and therefore also the d’s, are in Z. Since a is an 

integer, 

p| (dg tdyr+---+ dp—on?*), 
and since P|[p], 

Pl{do + dw +--+ + dp_on?~I, 


88 APPLICATIONS TO RATIONAL NUMBER THEORY (cHaP. 3 


so P\{do}. It follows that NP|N[do], pldo?—?, and finally pldp. This 
argument may be repeated p — 2 times, to show that pd, for 
k=1,...,p — 2, so that 


eo tem +---+ ep 90? ? 
Qo. SSS == CO eS 
ad 
where the e’s are rational integers. Repeating the entire argument 


p — 3 times, we see that 


a=fotfimt---+fpor?”, 


where the f’s are in Z. Hence 1, z,..., 2?” form an integral basis 
for K,. But from the equations 
aS Led, f=1—d, 


m=1—-2%4+ and @=1—-—2r4 2’, 


we see that A(x) = a7A(¢) and A(¢) = a?A(a), where a is a certain 
determinant with binomial coefficients as entries. Hence a? = 1, 


A(t) = A(r), and 1,¢,...,¢"? also form an integral basis, by 
Theorem 2-14. 


THEOREM 3-9. If ais an integer of K 5, there zs a rational integer, a, 
such that 


a? = a (mod P?). 


Proof: Since NP = p, the incongruent numbers 0,1,...,p —1 
form a complete residue system modulo P, so that for suitable b in Z, 


a = b (mod P). 
But 


p—l 
en OE = TL ket 2), 
and since ¢ = 1 (mod P), 
oP — BP Tl (w — b) = 0 (mod P®), 
so that we can take a = B?. 


If P4[a] and a = a (mod P”) for some a in Z, then a is said to be 
primary. 


3-3] THE CYCLOTOMIC FIELD Ky 89 
THEOREM 3-10. If ata, then for some positive rational integer f, 
ca is primary. 

Proof: For suitable a and 6 in Z, 

a =a-+t bx (mod P’), 
and wta, so that pla. Choose f so that 
af = b (mod p). 
Then since 
cf = (1 — r)f = 1 — fx (mod P*), 
we have 
a = (1 — af)(a+ br) =a + x(—af + b) = a (mod P?). 

We now investigate the units of Kp. 

THEOREM 3-11. The only roots of unity in Ky are the numbers +", 

O<r<p. 

Proof: The roots of unity are the numbers 


aie 
where ¢ and m are rational integers, and (¢, m) = 1. If such a number 
is in K,, and if tt’ = 1(mod m), then also 


Qxaitt! |m 


e 2ail/m 


= ¢€ 
is in Ky. The numbers mentioned in the theorem are the (2p)th 
roots of unity, so we need only show that e?**/” is not in K, if m42p. 
If m{2p, then either 4|m, or some odd prime gq ¥ p divides m, or 
p’|m. Suppose that e?**/” is in Kp. 

If 4|m, then 


ertil4 = 4 


isin K,. But then so are 1 +7 and 1 — 2, and 
1t+aq=[1—q and [2] = [1 +4), 


contrary to Theorem 2-38. 
If g|m, then 


c= etl 


isin Ky. But then the reasoning used in the proof of Theorem 3-6 


90 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 
shows that 
I 
[q] = [1 med ol? ’ 


again contradicting Theorem 2-38. 
If p?|m, then 
£m e2tile? 


isin K,. But é is a zero of 


pay 
xP —] 1Sm<r 
and 
p= II (— 2). 
l<m<p* 
ptm 


As before, the factors in this product are associated, and we get 
[p] = [1 — EP? , 
contradicting Theorem 2-39. 
THEOREM 3-12. Each unit ¢ of K, can be written in the form 
= §%n, 
where g 1s a positive rational integer and vn is real. 


Proof: Express ¢ in terms of the integral basis 1, {,..-, £?~*: 


e = f(t), 


where f is a polynomial with rational integral coefficients. Then 
clearly «; = f(¢*) is also a unit, since Ne = €,--- ep) = #1. Also, 


eps = JSP *) = fC") =f) = &, 


where the bar denotes the complex conjugate, so that €s¢p—s = les? >0, 
and 


3(p —1) 
e= II ce,.>0, 
s=1 
so that Ne = 1. 
Since €s = €p_s, 
€s 


3-3] THE CYCLOTOMIC FIELD K, 91 


The polynomial 


p—-l € p—l 
II ( _-— = II (€p—sX = és) 


s=1 En—s s=1 


has coefficients in Z, so, by Theorem 2-40, ¢;/ep_1 is a root of unity, 
and by Theorem 3-11, 


€) = + 5ep_1. 
Since either m or p + m is even, and since 
ym me ppt, 
we can write 
€, = +o #06. 4: 
The proof will be complete if it can be shown that the plus sign is 


appropriate here, since then the quantities ef "7 and e€, 1% are 
simultaneously equal and complex-conjugate, and are therefore real, 


so that € = e, = 9(€p_17). 
To show this, choose a from among 0, 1,..., » — 1 so that 
¢% =a (mod P). 
Then 


Since # = 1 — ¢?" is an associate of x, it follows that 
¢ %en_1 —- @ 
via 
is an integer of Ky, so that 
0%, 1 =a=f % (mod P). 
Thus 
é€ 


=+{79 and 
€p—1 €p—1 


= ¢79 (mod P). 


92 APPLICATIONS TO RATIONAL NUMBER THEORY [cuar. 3 
If the minus sign obtains in the equation, we have 
—f°7 = £9 (mod P), 
2¢79 =0 (mod P), 
P\{25?9], 
NP\|2?—", 
contrary to the fact that NP = p. The proof is complete. 


PROBLEMS 


1. Let p and q be distinct odd primes, and let ¢ be a primitive pth root 
of unity. 
(a) Show that 
p—1 3(p —1) 
p= IT (284) = (aleve it (= Fs 
(b) Show that 


(f¢ — $7) = oa — F-@ (mod q). 
(c) Deduce that 


Mp — as 
pea) = (—1)2@-D-3-) TT: cise SR 

a=1 c = ¢ mat 
(d) Show that the second factor on the right side of the last congru- 
ence above is (— 1)“, where yu is the number of numerically smallest residues 
(mod p) among gq, 2g,..., 3(p — 1)q which are negative, and so obtain a 

proof of the law of quadratic reciprocity. 
2. For an odd prime p and a positive integer h, put 


(mod q). 


h 
®; (x) = — = Pr (p—-)) + yp? (p—2) +.--4 Pe ie 4 1, 
My —_ 


and let ¢ be a zero of , and K,s= R(¢). Then the degree of A, is at most 
g(p*) =t. Put 1 — ¢ =7, and [x] = P. 

(a) Show that in K,), the ideal [p] has the factorization P’, P is prime, 
NP = p, ®(z) is irreducible and K, is of degree ¢. 

(b) Show that A(t) = (—1)3#¢-») - p? hp-h-1), (Hint: Notice that 
1 — ¢?*" is a zero of &,(1 — x), an irreducible polynomial of degree 
p — 1 with leading coefficient (—1)?~* and constant term p, and deduce 
that N(1 — ¢?**) = ((—1)?-1p)*"] 

(c) Show that if L is any prime ideal in K, different from P, and if 
¢* = ¢ (mod L), then a = b (mod p*). 


3-4] FERMAT’S EQUATION 93 


3-4 Fermat’s equation. For the sake of completeness, we consider 

first the equation 
i as a a (1) 

for the cases n = 2,4, and 3. When these have been disposed of, 
Fermat’s assertion would be proved if it could be shown that (1) 
has no solutions in rational integers x, y, z, with zyz ~ 0, if n is a 
prime larger than 3. 

The proof that (1) is impossible when n = 4 depends on the 
following theorem, which characterizes the solutions of (1) when 
n = 2. 


THEOREM 3-13. A general primitive solution (i.e., a solution in 
which (x, y, 2) = 1) of 
a+ty=2%, yeen, t>0, y>O0, z2>0 
as gwen by 
z=a’—b*, y=2ab, z=a?+0’, 
where a and b are prime to each other and not both odd, anda > b> 0. 


Remark: It is clear that one of x and y must be even, since other- 
wise x” + y* = 2 #2" (mod4). There is no loss in generality in 
assuming that it is y which is even. 


Proof: Suppose that 27+ y? = 2%. Since (z,y,z) =1, also 


(y,2) = 1, so that (e —y,z+y) =1 or 2. But zis odd and y is 
even, so that (2 — y,z + y) = 1. Hence, from the equation 


x = (2—y)(z+y); 


we deduce that z — y and z+ y must be squares, since they are 
positive. Now if ¢ and wu are fixed integers of the same parity (both 
odd or both even), there are integers a and b such that ¢ = a + band 
u=a-— b. Hence we can put 


z—-y=(a—b)?, zty=(a+bd)’, 


which gives 
_ p)2 2 
gen eee 
2 
2 (, _ p\2 
_ er ea = 2ab, 


x = (a— b)(a+b) = a? — 0’. 


94 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 


Since (2 — 2,2 +2) = (2a”, 2b?) = 2, we must choose a and b so 
that (a,b) = 1. Since z is odd, a+ b must be odd. Since y > 0, 
a and b must have the same sign, and since x > 0, |a| > |b|. Since 


the pairs a, b and —a, —b give the same solution, we can suppose 
that a > b> 0. 


TuroreM 3-14. The equation x* + y* = z‘ is not solvable in non- 
zero rational integers. 


Proof: It suffices to show that there is no primitive solution of 
the equation 


at + yt = 22. 


Suppose that z, y, and z constitute such a solution; with no loss in 
generality we can take t > 0,y >0,z2> 0, and y even. Writing 
the supposed relation in the form 


PP + YY = 2, 
we have from the preceding theorem that 
x” = a? — b?, 2=2ab, z=a*+ bd’, 


where (a, b) = 1 and exactly one of a and b is odd. If a were even, 
we would have 


1=7° = a — bd? = —1 (mod 4), 
so 2\b. We apply Theorem 3-13 again, this time to the equation 
z* + b* = a’, and obtain 
z=p*—g, b=2pg, a=p't+9¢, 
where (p, g) = 1,p > q > 0, and not both of p and g are odd. From 


y” = 2ab 
we have 


y” = 4pq(p* + ¢°). 
Here p, g and p” + ¢ are relatively prime in pairs, so each must be 
a square: 


p=r", q= 8’, rig =F, 
from which 


3-4] FERMAT’S EQUATION 95 


so that z> (r#+ st)? = &, 


or t< z+. It follows that if one solution of x* + y* = 2? were 


known, another solution r, s, t could be found for which rst ¥ 0 and 
0 <t< zt. But this would give an infinite decreasing sequence of 
positive integers. 


The case n = 3 is rather more difficult, since it is necessary to work 
in the quadratic field K3 = R(¢), where § = (—1 + iv3)/2 is a 
primitive cube root of unity. Not all the complications of the general 
case are present, however, since there is unique factorization of the 
integers of K3, as the following theorem shows. 


THEOREM 3-15. Given any two integers a and y of Ks, of which 
vy ~ 0, there are integers x and p such that 


a=xy+p, 0< Np < Ny. 
The integers of K3 therefore form a Euclidean domain. 


Proof: Since 1 and ¢ form an integral basis for K3, we can write 


a atb (a+ bs)(c+ 4g") _ 
y c+dt ce? —cd+d? 
where a, b, c, and d are rational integers, and F# and S are rational. 
Choose x and y in Z such that 
IR-—2 <3, ([S—yl <2; 
then 
2 


=(R—2z)— (R-2z)S—y)+S-y)’< 


H | G9 


- (x + yf) 
7 


Hence, if x = x + yf and p = a — xy, then 
Ne < $Ny<Ny, and Nop = pp = |p|? > 0. 
THEOREM 3-16. The equation 
8+ 7° + 9° = 0 (2) 


has no solution in nonzero integers of Kz. It therefore has no solu- 
tion in nonzero rational integers. 


Proof: We first note that one of €, 7, and # must be divisible by the 
prime 7 = 1 — ¢, if (2) holds. For put 


E+ n= p, 7n+é=a, Vo+eé=rt. 


96 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 
Then a simple calculation, using (2), shows that 
(0 +o + 7)? = 24por. 
Since the expression on the right side of this equation is divisible by 
$= —-¢ a a 

the left side must be divisible by x, and therefore by 7°. Returning 
to the right side, it follows that one of p, o, or 7 must be divisible 
by +. If zp, then x|(é* + 7°), so x|8%, and finally x|v. 

If there were a common factor in two of é, 7, and 2, it would also 
oecur in the third, and could be divided out; so suppose that (2) 
holds, that &, 7, and @ are relatively prime in pairs, and that a|d. 


By Theorem 3-10, we may suppose that an appropriate power of 
¢ has been introduced into é and 7 so that 


é=1, n = —1 (mod 3), 
which we express by putting 
§é=1-+ 3a, n = —1-+ 38, 


where a and £ are integers of K3. Put 


2 
Ae EE, GST, 
rs vs vs 
these numbers are integers of K3, since 
A= 1+ = («+ 60), 
3 
B= —1l +7 Ge + 8), 
3 2 
C= = (a + B)g 
Moreover, 
3 _g\3 
apc = £2 -(=) (4) 
T T 
= —tA + 0B, n= 8°A — SB. (5) 


From (5) we see that (A, B) = 1, since otherwise — and 7 would 
have a common factor. From (8), also (A,C) = (B,C) = 1. 


3-5] KUMMER’S THEOREM 97 


It follows from (4) that A, B, and C must all be cubes, say 
A= ¢,B=,°,C =y*, and 


gtx+y =0. 
Now A =1, = —l, C = 0 (mod n), 


so that from (4), y contains a smaller power of « than does #. 

Repeating the argument a sufficient number of times, we would 
arrive eventually at a solution of (2) in which no variable is divisible 
by 2, which is impossible. 


3-5 Kummer’s theorem. If p is an odd rational prime, and its 
associated cyclotomic field K, has class number h, then p is said to 
be regular if pth. According to Theorem 3-5, if p is a regular prime 
and A and B are ideals in K, such that A? ~ B?, thn A~ B. It 
was this essential property of the regular primes which enabled 
Kummer to prove that Fermat’s conjecture is correct for all regular 
primes. (Unfortunately, there are infinitely many irregular ones.) 
We shall not be able to prove Kummer’s theorem in its entirety, but 
shall have to assume without proof a difficult preliminary result. 
We can, however, prove the following theorem. 


THEOREM 3-17. If pts regular, the equation 
x? + y? + 2? = 0 (6) 
has no solution in rational integers x, y, z for which ptzxyz. 


Proof: Suppose that the theorem is false, and that x, y, and z 
satisfy all the requirements. We can assume that (2, y) = 1 and 
p > 3; as usual, ¢ is a primitive pth root of unity, and P = {1 — ¢]. 
From (6) we obtain 

p—l 


TI @ + iy) = -2 
so that 
p—l 
i [x + 5"y] = [z]?. (7) 


Now no two of the factors on the left have a common factor. For, 
if Q is a prime ideal such that Q|[x + ¢1y] and Q\[z + ¢“2y] for 
Mm, < Mo, then 


Qi[sma(l — ge) y), 


98 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 


and hence Q|P[y]. But from (7), Qi[z], so Q ¥ P (since ptz); hence 
Q\[y]. But then also Q|[z], and we deduce that NQly” —) and 
NQ\|xz?—!, which is contrary to the assumption that (z, y) = 1. 

It follows that each factor on the left side of (7) is the pth power 
of an ideal. If 


[x + ¢ y| = A®, 
then A? ~ [1] = [1]?, so that by Theorem 3-5, A itself is principal, 
say A = [a]. Then 
[x + Sy] = [a]? = [a]. 


Hence 
z+ ty = ea”, 


where ¢ is a unit of K,. Using the canonical form for units in Ky 
obtained in Theorem 3-12, we have 


at+cy= mo’, OSgsp-], 

where 7 is real. By Theorem 3-9, since [p]|P?, 

a? = a (mod [p}) 
for some a in Z, so that 

t+ sy = £% (mod [p]), 

where o is a real integer of K,. The complex conjugate of the integer 

f(a + fy) —o@ 

P 


is also a field conjugate, and is therefore also an integer. Since 
p = p and & = o, we have 
o = (x + fy) (mod [p)), 
and 
o = (« + & *y)k (mod [p)), 
so that 
xg~9 + yg*9 — af? — yg?" = 0 (mod [p)). (8) 


Two of these exponents must be congruent modulo p. For suppose 
that they are all distinct, and put 


Banco 4 Zero _ Fp Feo 
P P P P 


3-5] KUMMER’S THEOREM 99 


Then pf has a representation in terms of distinct elements of an 
integral basis, the coefficients not being divisible by p. But since 
8 is an integer, p6 also has a representation in which the coefficients 
are divisible by p, and this is contrary to the definition of a basis. 
We conclude that g must have one of the values 0,1, or (p + 1)/2 
(that is, 2g = 1 (mod p)). 
If g = 0, the congruence (8) gives 

yg — ys = 0 (mod [p)), 
whence, since ¢* — 1 is an associate of z, 

P*lylP, = Pityl,~—soly, 
which is false. If g = 1, then (8) yields 

x(1 — §?) = 0 (mod [p)), 


which implies that p|z, which is also false. Finally, if g = (p + 1)/2, 
then from (8) we get 


(x — y)x = 0 (mod [p)), 
which gives 
x = y (mod p). 
Interchanging y and z in equation (7), we deduce that also 
x =z (mod p). 
But then equation (6) implies that 
x? + y? + 2? = 32”? = 0 (mod p), 


which is false since p > 3 and pix. Hence the theorem is not false. 

Because of its methodological interest, we deduce the general 
Kummer theorem from the following lemma, whose proof is too long 
for inclusion here: 


KUMMER’S LEMMA. Let p be a regular prime. Then if € is a unit 
of K, and a is a rational integer such that 


« = a (mod P?), 
then ¢ ts the pth power of another unit of Kp. 


This is a partial converse of Theorem 3-9. Using it, we can 
generalize Theorem 3-17 in two ways: by allowing z, y, and z to be 


100 APPLICATIONS TO RATIONAL NUMBER THEORY [cHaP. 3 


integers of K, instead of rational integers, or by dropping the restric- 
tion that ptryz. 


THEOREM 3-18. If pis a regular prime, the equation 
ze? + y? +2? =0 


has no solution in nonzero integers x, y, 2 of Ky for which r|xyz. It 
therefore has no solution in nonzero rational integers x, y, 2 for which 
p\xyz, and therefore (by Theorem 3-17) no nonzero rational integral 
solutions. 


Proof: We first show that the equation 
oP + y? = egP2!? rizye’, eaunit of Kp, (9) 


has no nontrivial solution if u = 1. Equation (9) is a generalization 
of the equation obtained from (6) by supposing that z = 2’x“, where 
ate. 

We may suppose that z and y have no common numerical factor, 
since it would also occur in z and could be canceled out. (Notice that 
it cannot be assumed that the ideals [xz] and [y] are relatively prime, 
since [xz, y] may not be principal.) We may also suppose that x and y 
are primary, since they may be multiplied by appropriate powers of ¢ 
without affecting (9). If (9) is written in the form 


p—l 
I] (a + oy) = ex?2’?, (9") 
m =0 


it is clear that at least one of the factors on the left, say x + ¢*y, is 
divisible by x. Since, however, the differences 


(x + Sty) — (e+ Sy) = OF -— Sy 
and . . . 
Sat fy) — etsy = -C- os 
are also divisible by 7, each factor on the left in (9’) must be divisible 
by . If two factors were divisible by +”, we would have 


n?|(g* — gy, 
rle ry (e’ a unit), 
«ly, 


and similarly z|zx, contrary to assumption. On the other hand, since 


3-5] KUMMER’S THEOREM 101 
xz and y are primary, there is an a in Z such that 


x+y =a (mod P?”). 
But then 


a=x+y=0 (mod P), 
pla, 
P?|(a}, 
x+y =0 (mod P?). 


Thus the total number of factors of a on the left side of (9’) is at least 
p+ 1, so that u > 1. 
Now rewrite (9) as 


p-—l 
TL fe + iy] = Peep. 9") 


Any common factor different from P of two ideals in the product on 
the left side of (9’’) must be a factor of both [z] and [y], and therefore 
of [z’]. After dividing out every such common factor, as well as one 
factor of P from each ideal, the ideals remaining on the left are pair- 


wise prime, and their product is a pth power; therefore each factor 
separately is a pth power. 


Combining all these results, we can write 
[cx + y] = Pre—D41y,P p, 
[zx + oy] = P J,,? D, m=l1,...,p—1, 
where D = [z, y], and Jo, Ji1,..., Jp_; are certain ideals not divisible 


by P. If we put 4, = p(u — 1) +1 or 1, according as m = 0 or 
m > 0, we have, for m ~ 1, 


[z+ fyJP™ Im? D = [x + Syl[x + oy] = PJy? Dix + &”y); 
since P is a principal ideal, it follows that 
J mm D ia J 1” D, 


so that J,,? ~ J,”, and by Theorem 3-5, Jm ~ J;. Thus integers 
Ym and 6, (which are not divisible by x) exist such that 
[Y¥ml dm a [SnlJ1, m=0,2,3,...,p—1. (10) 


Raising both sides of (10) (with m = 0) to the pth power, and then 
multiplying through by D P?“—)+1 we have 


102 APPLICATIONS TO RATIONAL NUMBER THEORY (cHap. 3 
[yol?D pr@—bDt+1 Jo? = DP Jy? - PPD 5], 


or 
[vo" Iz + y] = [ + cy] PP?“ [85?1. 
Similarly, 
D Ply2}? Jo? = D Pls.}? Jy?, 
[x + ¢?yl[ve?] = [x + cy][827], 
so that 


yo" (a + y) = €1(@ + Sy) a? V4 ?, 
12? (a + §?y) = eo(x + ty) dq”, 


where e; and €, are units. 
We now use the identity 


(c+ oy) + @@ty)t = (a+ ry) + 0). 


We multiply through by yo?72”, and in the resulting equation replace 
the left sides of equations (11) by the right sides. After canceling the 
common factor x + fy, there results 


€2(Yob2)” + er$a?“—) (7089)? = (1 + £)(yore)?. 


Since «1, €2, , and 1+ ¢ = (1 — ¢*)/(1 — ¢) are units, this equation 
is of the form 


(11) 


EP + egy? = egr? gp (12) 
where ¢3 and e4 are units and tind. By Theorem 3-9, 
=a), 1” = ae (mod P?), 
where a; and ag are rational integers; since u > 1, (12) gives 
@, + €3@2 = 0 (mod P?). 


Since +n, also r{d2, so that paz. Choose ag so that aga3=1 (mod p”); 
then 


a2a3 = 1 (mod P?), 
aia3 + e3 = 0 (mod P?). 
By Kummer’s lemma, e3 = e5?, and (12) becomes 
BP (es)? = eqn? DP, 


which is an equation of the form (9) with u replaced by u — 1. 


Repeating the argument wu — 2 times, we would have a solution of (9) 
with u = 1, which is impossible. 


3-6] THE EQUATION 27 + 2 = y° 103 


Before leaving the subject of Fermat’s conjecture, it might be of 
some interest to mention certain other facts known about it. We 
consider only the solvability of equation (6), 


xP + y? + 2? = 0, 
in Z. 
It was proved by Wieferich in 1909 that if (6) holds in integers 
z, y, and z such that ptzyz (the so-called Case I), then 


2-1 = 1 (mod 7’). 
Later investigators have shown that in Case I, 
q?* = 1 (mod p’) 


for every prime g < 43; J. B. Rosser used this fact to show that 
there are no solutions in Case I for p < 41,000,000. D. H. and Emma 
Lehmer later extended Rosser’s method to prove Fermat’s conjec- 
ture m Case I for p < 253,747,889. This in turn implies that if 
there is a solution in Case I, it must be that log log z > 23. 

Without the restriction to Case I, Theorem 3-18 disposes of the 
regular primes. Kummer also found criteria to handle the irregular 
primes less than 164; this was pushed on to all p < 619 by H. S. 
Vandiver and his collaborators, and quite recently D. H. and E. 
Lehmer and Vandiver have used high-speed computing techniques 
to settle the problem for all p < 2000. It turns out that of the 302 
primes less than 2000, 118 are irregular; while it is not known that 
there are infinitely many regular primes, there is nothing in the 
limited data available to indicate that there are only finitely many. 


3-6 The equation x” + 2 = y®. For the remainder of this chapter 
we shall be primarily concerned with the cubic analog of Pell’s 
equation. At one point in the argument, however, we shall need the 
following auxiliary result. 

THEOREM 3-19. The only solutions in Z of the equation 

gt+2=y° (13) 

arex = +5, y = 3. 


Proof: Following Euler’s idea, we make use of the arithmetic of 
the quadratic field R(V —2). By Theorem 2-16, the integers of this 


104 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 


field are of the form a + bV —2, where a and 0b are rational integers. 
By a proof exactly paralleling that of Theorem 3-15, it can be shown 
that they form a Euclidean domain: given a, b, c, d in Z, with cd ¥ 0, 
there are e, f, g, h in Z such that 
a+ bV—2 = (c+ dvV—2)(e + fv —2) + gg +hv—2), 
g? + 2h? < c* + 2d?. 


It follows that R[-V —2] is a unique factorization domain. 


We first show that if x and y satisfy (13), then z + V —2 and 
z — V —2 are relatively prime. It is clear that 


(c+ V—2,24 — V—2)|-2V —-2, 


and since —2V/ —2 = (VW —2)% and V —2 is prime in the domain (by 
Theorem 2-39), it must be a (c+vV-—2,2-V-2)= (/—2 2)" 


O<m<3. Butifz+ V—-2= @+bvV-2) V—2,thenz = —2b, 
whence, by (13), 


4b? + 2 = y°, 
y® = 2 (mod 4), 
which is impossible. 
Since the only units of you —2) are +1, it follows from (13) that 
r+ V—2= (a+ bvV—-2)%, 
where a and 6 are rational integers, and equating real and imaginary 
parts gives 
a® — 6ab” = z, 
3a"b — 2b° = 1. 


From the second of these equations it follows that b = +1, and hence 
that 3a” — 2 = +1,ora = +1. From the first,z = +1 6 = +5. 


3-7 Pure cubic fields. The field L = R(~/d), in which d> lisa 


cube-free rational integer and Wd is real, is called a pure cubic field. 
In this section we determine an integral basis for L and note certain 
other properties. 


Since d is cube-free, we can write 
d = ab’, 


3-7] PURE CUBIC FIELDS 105 


where ab is square-free. Since VW d? = bv a*b, the numbers 1, 
Vab?, ~/a"b form a basis for L. Following Dedekind, we say that L 
is of the first or second kind, according as 9 does not, or does, divide 


a* — b*. The reason for the distinction is made clear in the following 
theorem. 


THEOREM 3-20. The numbers 
1, Vab?, W/ab 
form an integral basis for L wf it zs of the first kind. The numbers 
4(1 + aWab? + dV ab), Vab?, W/ ab 
form an integral basis for L if it is of the second kind. 


Remark: Note that the second basis represents every integer 
represented by the first, since 


1 v/ab? + bWa2b 
32, oes + (2p — az,)W/ab® + (23 — be) W/a2b 


= 21 + 2oV ab? + 23~/a*d. 
Proof: Suppose that w is an integer in L, and that 
® = 24+ to ab? + t3V ab, 21, X2, 3 in R. 
Then the conjugates of w are 
wo! = ty + ptoV/ ab? + p*x3V/ ab, 
wo! = 2 + p*zaVab? + pt3V/ a’b, 
where p is a primitive cube root of unity. We see that 
oto +o” = 3x, 
V07b(w + po’ + pw!’) = Zabze, 


V ab? (wo + pw’ + pw’) = 3abzz, 


and since the left sides of these equations are algebraic integers and 
the right sides are rational, it follows that the numbers 32,, 3abze, 
dabzrz are rational integers. Hence for any integer w in L, there are 
Y1, Yo, ¥3 in Z such that 


Zabo = y1 + yoV ab? + y3V/ a". (14) 


106 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 
We show first that ab is a divisor of y;, ye, and y3, and so can be 
omitted in (14). 

Let p be a rational prime dividing a, and let P be a prime ideal of 


L which divides [p]. It was supposed that ab is square-free; a fortiorz, 
(a,b) = 1, and Pt[b]. If we put 


a=Vab?, B= Vo, 
then P|[a]?, so P?|[a]*; since L is of degree 3, it follows from Theorem 


2-39 that [p] = P®. Hence P\|[a] and P*{|[6). 
Now suppose, in accordance with (14), that 


yi + yoo + y38 = 0 (mod 8ab). 


Then 
yi + you + y38 = O(mod P*), 
y; = 0 (mod P), 
y= 0 (mod Pp); (15) 


y= 0 (mod P*), 
yoo. + y38 = 0 (mod P*), 
yoga = 0 (mod P’), 


y2 = 0 (mod p), (16) 
y3B = 0 (mod P*), 
y3 = 0 (mod p). (17) 


By equations (15), (16), and (17), and the fact that p was an arbi- 
trary prime divisor of a, we see that a divides y, ye, and y3. Simi- 
larly, b divides 1, ye, and y3. It follows that there are 21, Ze, 23 mm 
Z such that 


380 = 21 + zea + 238. (18) 
Let the defining equation of w be 
m+ cz" + cox +e, =0, 4, Ce, €3 in Z. 


Then by (18) and the analogous equations for 3w’ and 3w’’, 
q = —wtw +o") = —a, 
Co = wo’ + wo’ + ww’! = 4 (24? — abz223), (19) 


C3 = ae a = —ge (21° + ab*z_° + a*bz3° = 3abz129%3). (20) 


3-7] PURE CUBIC FIELDS 107 


Suppose that 3|a; then 3{b, and L is of the first kind. Since cp is 
in Z, 3|z;. Since cz is in Z, 


0 = —27c3 = 3 - b?z.° (mod 9), 


whence 3|ze, and by (20) again, 3|zg. In this case, then, the numbers 


1, Vab?, v/a7b constitute an integral basis for L. A similar argu- 
ment applies in the case that 3b. 
Suppose now that 3+ab, so that 


a? = b? = 1 (mod 3). (21) 
If 3|z1, then by (19), 3|zoz3; if 3|zo, say, then it follows from (20) 
that also 3|z3. Similarly, if 3|z., then also 3|z; and 3\|zg. Hence 3 


divides all or none of 21, Ze, 23; in the first case w is of the form speci- 
fied in the theorem. 


We now examine the possibility that w in (19) is an integer, but 
that 3121z2z3. Then by (20), (21), and Fermat’s theorem, 


z;° + ab’z.? + a*bz3? = 0 (mod 8), 
21 + az, + bzg = 0 (mod 8), 
2, = az. = bz3 (mod 8), 
Zo = az, 23 = bz, (mod 8), 
22 = QZ, + 3te, 23> bz, + 3tg. 
Substituting these expressions for zz and z3 into (20), we obtain 
—27c3 _ 21° + ab? (az, + 3t.)* + a’b (bz, + 3t3)° 
= 3ab21 (az, + 3te) (bz, + 3ts3) 
= 2;°(1 + atb? + a?b* — 302d?) 
+ Qz,? (a*b7to + a’b*tz aes a*bts as ab*tz) 
+ 272 (a7b*t2” + ab*ts? — abtats) + 27 (abtp* + a2bts°) 
= 21°(1 + ath? + a?b* — 3a2b?) 
+ 92: (ab?é(a? — 1) + a?btg(b? — 1)) (mod 27). 
By (21), 
0 = —27cz = 2,5(1 + a*b? + a?b* — 3a7b?) (mod 27), 
and it follows that 


1 + a*b? + a?b* — 3a7b? = 0 (mod 27). (22) 


108 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 
Using (21), we can put 
b? = a? + 3f + 9g, 


where f and g are rational integers and 0 < f < 2. Then the con- 
gruence (22) reduces to 


o(f,a) = 2a® + (9f — 3)a* + 9(f? — fia? + 1 = 0 (mod 27). 
For f = 0, this becomes 
(a? — 1)?(2a2 + 1) = 0 (mod 27), 
which is true for every a not divisible by 3, since 
2a + 1 = a* — 1 = 0 (mod 8). 
Moreover, for every a such that 31a, 
y(1, a) = (0, a) + 9a* = 9a* ¥ 0 (mod 27), 
y(2, a) = 9(0, a) + 18a* + 18a? = 18a7(a? + 1) 0 (mod 27). 
Thus we find that if 3+z,z9z3, than cg is in Z if and only if 
a” = b* # 0 (mod 9), 


(i.e., if and only if L is of the second kind) and az, = bz3 (mod 3). 
If this is the case, then c; and cp are also rational integers, and 


o = $(e1 + (az + Bta)a + (ba + 3ts)8) 


1 b 
= TT + he + te8 


is an integer in L. The proof is complete. 


In the course of the proof, it appeared that ifw = (« + ya + 28)/3 
is an integer, and if one of z, y, z is divisible by 3, all of them are. 
In particular, if x + ya is an integer and x and y are rational, they 
are also integers. 

We now consider the units of L. If Z is of the first kind, then 


n=x+yat zB 
is a unit if and only if Ny = n/n’ = +1, or 
x? + ab’y? + abz? — 3abryz = +1. (23) 


3-8] TWO LEMMAS 109 
If L is of the second kind, then 


n= 5 (1 -+ aa + dB) + 0+ wp 


is a unit if and only if 

a® + ab’y® + a?bz*? — Z3abryz = +27, (24) 
where u = z, au + 3v = y, and bu + 3w =z. If 7 is positive, the 
plus sign must be chosen in (23) and (24), since 7’ and 7’’ are com- 
plex conjugates. 

The field Z has the property that each of its elements is either 
rational or of degree three. For if there were an element of degree 
two, L would be an extension of the field generated by that element, 
and so would be of even degree. It follows that +1 are the only 
roots of unity in L. Since a has one real and two nonreal conjugates, 
we see by Theorem 2-45 that either LZ has only the units +1, or 
else there is a fundamental unit ~, which may be chosen between 
0 and 1, such that every unit 7 of L can be expressed in the form 


7= cs 
where 7 is a rational integer, positive, negative, or zero. 


A positive unit of the form 7 = xz + ya is always smaller than 1. 
For since x* + dy® = 1, we have 


go= an! = 2? — tyatyea’ >ltata’ >3, 
since zy 1s negative. Consequently, for such a unit we have 
7 = &, n> 0. 


The same remarks apply to a positive unit of the form z + 28. 


3-8 Two lemmas. For simplicity in notation, we define the 


binomial coefficient *) to be zero for k > m. Here and hereafter 


in this chapter, lower-case Latin letters stand for rational integers, 
unless otherwise specified. 


THEOREM 3-21. Let m be a positive integer. Then 


a els 


110 APPLICATIONS TO RATIONAL NUMBER THEORY [cHaP. 3 


Proof: Put 
m m m 
s =(9)+(3)+(8) + 
m m m 
-()+@)+@)+ 
m m m 
=-()+@)+@)*- 
Then 
So + S, + Se = 2” = (—1)"(mod 8), 
and 
9, = nme + aes _ 5, + S, (mod 3 
2 = 1 9 4 5 -++ = —mS, + S; (m )» 
m\ m m\ m — 3 
8 = (P)E + (F) MES + = So (mod 3), 
so that 


(1 + 2m — m?)So = (—1)™ (mod 8). 


THEOREM 3-22. Suppose that x and y are integers such that 
(x, dy) = 1, and suppose that 


(2t+yVd)* =X+YVd+ Z(W4a)?, 


where X, Y, and Z are rational andn>1. Then XYZ <0 except 
an the following cases: 


(W/10 — 1)® = 99 — 45~/10, 
(W4 —1)* = —15 + 12/2. 


Proof: Since (z,d) = 1, it is clear that X ~ 0. Suppose that 
Z = 0, so that 


(Je ()eves ewer na a 
Dividing by i y”, this becomes 


as n—2\  Qyr-3k—-2,,3k gk 
et BI 0) 


aa 3k /) (8k +1)(@k +2). 


3-8] TWO LEMMAS 111 


Let q be a prime divisor of y. Then since g** > 2°* > 3k + 2 for 
k > 1, each term in the last sum is divisible by g, which is impossible 
since (x,y) = 1. Hence y = +1. 

When n = 0 (mod 3), equation (26) can be written in the form 


ee eed 
k>1\ 3k 3k +1 : 


when n = 1 (mod 3), 


_ n—4 73 (n—4) deat n—2 — 
m =( 3k ) @k+1)Qk+2) ’ 


2x 3kyn—Sk—4 qa(n—4)—k 


and when n = 2 (mod 3), 
_ yr 2qhn—2) me > ee ark yn—Bk—2 gh (n—2)—k 


k>1 
The same argument now shows that x = +1, and since it is clear 
from (25) that zy < 0, we have z = —y. 

Now let g be a prime divisor of d, and suppose that q*||d (that is, 
g*|d but g@t4d). If g* > 5, then g™* > 5* > 3k + 2 for k > 1, so 
that each term in the sum in (26) is divisible by qg, which is impossible 
since (x,d) = 1. If q = 3, then since 3}(3k + 1) (3k + 2) we reach 
the same contradiction. Hence g* = 2 or 5, and d = 2, 5, or 10. 

The information obtained so far shows that 


1-( 3 a (5) - eesO: (27) 


If d = 10, this becomes 
( i )=1= 5 (n— 5)(n? ~ 4n +6) 
; 4 2-10* 
= ” y (3k + 1)(k + 2) 


This equation is true for n = 5, and leads to the first of the excep- 
tions mentioned in the theorem. For other values of n, we may 
divide through by (m — 5)/6 and obtain 


n?—4n+6 
Ee? 0 (ee Ee 
k2>1 3k—1/ 3k(3k+1) (8k+2) (8k-+3) (3k-+4) (3k+5) 


112 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 


The highest power of 5 which divides the denominator of a term in 
the sum is clearly at most 5(3k + 5), and since 5**1 > 5(3k + 5) for 
k > 2, we have 


n?—4n+6 
ee ee (n—2)(n—3)(n—4) - 12-10? | 
Ane ed t= ( ‘se 3-4-5-6-7-8 aa 


which is false since —2 is a quadratic nonresidue of 5. 
When d = 2 or 5, equation (27) leads to the congruence 


1+ (3) 4(" 5) + =0 (mods), 


which is false by Theorem 3-21. 

There remains only the possibility that Y = 0. The proof that 
this happens only in the case of the second exception mentioned in 
the theorem is completely similar to what has just been done for the 
case Z = 0, and we leave the details to the reader. (The only varia- 
tion lies in the fact that d may now have the sole prime divisor 2, 
so that d = 2 or 4.) 


3-9 The Delaunay-Nagell theorem. As we shall see in the next 
chapter, there is a general theorem which implies that the equation 


ax® + by® =c (28) 


has only finitely many solutions in integers z, y if a, b, and ¢ are 
nonzero integers. In certain special cases, however, it is possible 
to make more precise statements about the number and nature of 
possible solutions. We shall concern ourselves here with the equation 


x* + dy*® = 1, (29) 


which was first considered in detail by B. Delaunay. His method 
was later refined by T. Nagell, who also applied it to (28) in the case 
that c = 1 or 3. Nagell’s result concerning (29) is as follows. 


THEOREM 3-23. Equation (29) has at most one solution in integers 
x, y different from zero. If x, y; 1s a solution, the number x1 + 1 Vd 


as etther the fundamental unit of L = R(V 4) or its square; the 
latter can happen for only finitely many values of d. 


3-9] THE DELAUNAY-NAGELL THEOREM 113 


If d = +1, (29) has only trivial solutions. If d contains a cube 
larger than 1, it can be absorbed into the factor y®?. Hence we can 
assume that d is cube-free and larger than 1. 

The idea of the proof is quite simple. If 


Ni tyVd)=22+dy2=1, 1 <0, 


then z, + yi-Vd is a positive unit of L, and as such is a positive 
power of the fundamental unit  mentioned:at the end of Section 3-7. 
It therefore suffices to show that no power of a positive unit smaller 


than 1, with exponent larger than 2, is of the special form x + yv d, 
and to show that the square of a unit is of this form in only finitely 
many cases. We divide the proof into four parts, summarized in the 
next four theorems. 


Tueorem 3-24. The square of an irrational unit of L of the form 
n= x+yat 2, x, y,zinZ 
as wtself of the form X + Ya only if 
n= 1+ V/20 — W/50. 
The square of a unit of L of the form 
=3(a+ya+t 28),  3tayz, 


(af such exists) ts atself of the form X + Ya for only finitely many 
values of d. 


Proof: Let n=azx2+ya+2zs 
be a positive unit of L, so that, by (23), 
x? + ab’y? + a*bz? — 8abryz = 1 (30) 
and 


n” = (x* + 2abyz) + (2ry + az*)a + (Qrz + by*)8. 


If the coefficient of 6 in this last expression is 0, then 


and substituting this into (30) we obtain 


3 3 2 ¥° y? 
dy? — d* — = = 
x’ + dy 3,3 + 3d S A 


114 APPLICATIONS TO RATIONAL NUMBER THEORY [cHaP. 3 


or d?y® — 20z?dy? — 8(x® — z*) = 0, 
whence dy® = 102° + 22-V/27x* — 2x. (31) 
Thus the number 27x* — 2x must be a square: 

(272° — 2)r = #. (32) 


If x is even, then (272° — 2, x) = 2, so that 
272° —2= +42u?, «= 420”. 
Since —1 is a quadratic nonresidue of 3, we must choose the lower 
sign, and eliminating x we obtain 
108v® + 1 = u?, 
(u— 1)(u+1) = 108°. 
Since (u — 1,u-+ 1) = 2, this implies that 


utl1 = 54r®, uF 1 = 2° 
whence 
27r® — s° = (3r7)® — (s?)3 = +1. 


From the truth of Fermat’s conjecture forn = 3, it follows that r = 0, 
which gives vy = 0 andz = 0. But then also y = 0, by (31), which 
is impossible since 28 is not a unit. 

If x is odd, (32) yields 


Q7e7 —2=+u?, «x= +r’. 
Here the upper sign must be chosen, and we have 
(37) = vu? + 2, 
which by Theorem 3-19 has the sole solution z = 1,u= +5. By 
(31), dy® = 10 + 10, so that d = 20,y=1. (If y = 0, thenz = 0, 
and 7 is rational.) The sole solution is therefore 
(1 + V/20 — W/50)? = —19 + 7V/20. 
Now let 7 be a positive unit of the form 


n= 3(t + ya + 28). 
Then by (24), 
2? + ab’y? + a*bz? — 3abryz = 27, (33) 
and 
Qn? = (x? + 2abyz) + (Qry + az*)a + (2Qrz + by?)8. (34) 


3-9] THE DELAUNAY-NAGELL THEOREM 115 


If 3|z, also 3|y and 3|z, and we have already treated this case. Sup- 
pose that 3tz. If the coefficient of 8 in the expression for 7” is 0, we 
again have 


by” 
— — On e 
Substituting this into (33), it follows that 
dy® = 10z° + 62/32? — 6z, (35) 
so that 
32* — 62 = 2. (36) 


If x is even, the fact that 34x implies that 


a —2= +6u?, 2 = +20’, 
whence 
+4y® —1 = +3u?. 


Since 3+(4v® + 1), we must choose the upper sign; the last equation 
can then be written as 


(u +1)? — (wu — 1)? = (207)3, 


so that |u| = 1. Hence x = 2, and by (35), dy? = 80+ 72. The 
lower sign yields d = 1 or 8, both of which are excluded. Hence 
d = 19, y = 2, andz = —1. The only solution in this case is 


2+ 2V V19)? 
Cae —8 + 3V/19. 
If x is odd, (36) implies that 


a —2=43u?, x = +0’, 
so that 


+v® — 2 = +3u?. 
The lower sign must be chosen: xz = —v* and 
8u? — 2 = 7°. (37) 


But it is an immediate consequence of Theorem 4-17, to be proved 


in the next chapter, that (37) has only finitely many solutions, and 
the proof is complete. 


We note for future use that if u, v satisfy (37), then v must be odd. 


THEOREM 3-25. The fourth power of a positive irrational unit of Lis 
never of the form X + Ya. 


116 APPLICATIONS TO RATIONAL NUMBER THEORY [cHaP. 3 


Proof: Let ¢« be such a unit, 


e= ¥ (x1 + yia + 28), 
and suppose that 
f =X+ Ya. 


Then since the coefficient of 6 in €* is 0, we have 
6bx,7y1? + 4252, + 4ab?y1321 + 12absyyi21" + a?bzy* = 0. (38) 
If we put 
n= =4(at ya zB), 
then 
x = 3(a,” + 2abyiz1), 
y = 3(2riy1 + a2”), 
z= 3 (2x21 + by”). 
Since 77 = X + Ya, we can apply Theorem 3-24. The cases 
d=2, 2=y=-2=3, 
d = 19, x=y=2, z=-1 
are impossible, since in the first the above equation for z becomes 
—Q = 27,2, + 2y,7, while in the second the system is easily seen to 


be inconsistent for all choices of signs of 21, y1, 21. Hence it must be 
that xz = —v*, where »v is odd, so that 


30? + 21? = —22aby;21. 


Since v is odd, so is z;, so that 3v? + 2,7 = 4 (mod 8). Hence three 
of the numbers a, b, y;, 2; are odd, and the fourth is even. By (88), 
a’bz,* is even, so y; is odd. If either a or z; is even, (38) implies 
that 6bz,7y,? = 0 (mod 4), which is false. If b is even, (38) implies 
that a”bz,* = 0 (mod 4), which is false since b is square-free. The 
proof is complete. 


THEOREM 3-26. The cube of a positive irrational unit of L 1s never 
of the form X + Ya. 


Proof: If 
= 3(t + ya + 28) 


is a positive unit, the coefficient of 8 in 7° is 
A (bay? + x72 + abyz”). 


3-9] THE DELAUNAY-NAGELL THEOREM 117 


We see from the equation 


2? + ab’y? + a?bz*? — 3abryz = 27 (39) 
that (x, b) = 1, and deduce from the equation 
bry? + xz + abyz? = 0 (40) 


that blz. From (39) again, 6 = (2, y,z) = 1 or 3. Since 7 ¥ +1, 
y and z are not both zero, and we can write 


t= ddydox1, y= ddedsy1, = 6bd1d321, (41) 


(5) (E)-m (8) ta 


and 2; > 0,y, > 0,2, > 0. The numbers d,2, doy, d3z, are rela- 
tively prime in pairs. Substituting the values from (41) into (39) 
and dividing by 6°bd,dod3, we obtain 


do"d3x1y17 + dy7doxr 1721 + ab*d,d37y121" = Q. 
It follows from this that d,|x1, dg|y1, and d3|z,.. Putting 


where 


%1 = aXe, Y1 = doo, Z1 = dzZo, 
substituting, and dividing by d;d2d3, we obtain 
de® xoye” +- dy>x97Ze + ab7d3*yo2z—” = 0. 


A consequence of this is that 2,|ab7d3°yoz2”, which in turn implies 
that x2 = 1. Similarly, yo = z. = 1, so that 


d,? + dp? + ab’d;* = 0 (42) 
and 
t= 5d17de, y= 5do"ds, ,= bd, d3”. 


Substituting these values into (39), we obtain 


2 
d,°d,° + ab*do°d3” + a*b*d,°d3° = 3ab7d,3do*d3° = i 


xa 
Eliminating ab*d,* between this equation and (42), we have 


3 
d,° + 6d,°d,* + 3d,7d.° —¢ dy? = (=) ’ (43) 


and putting d,* = u, d.® = v, 3/6 = w, this becomes 
u® + 6u20 + 3uv? — v? = w?. (44) 


118 APPLICATIONS TO RATIONAL NUMBER THEORY [cHaP. 3 
But it is easily verified that 
(u® + 6u7v + 3uv? — v?)U? = V3 + W, 

where 
U=w+wte’, V=uv+3u27—0?, W = 3u2v+ 3w’. 
Since neither U nor V is zero for relatively prime u and v, (44) can 
hold with w ¥ 0 only if W = 0, that is, if w= —v. In this case 
w =v. Since (d;, do) = 1, it follows that d; = —1, dz = 1, 6 = 3. 
This, however, leads to the values x = 3, y = —3, z = 0, for which 
the coefficient of 8 in 7° is not zero. 

THEOREM 3-27. If p > 3 1s prime and 

= 3(2 + ya + 28) 
7s a positive unit smaller than 1, then n? ts not of the form X + Ya. 


Proof: Suppose that z = 0. Then 3|z and 3ly, and 


_ (t\ y\r _ 
my = (5) +a(5) = 


so that (; rd “) = 1; by Theorem 3-22, the coefficient of 8 in 7? is 


not zero. Thus z +0. By the same reasoning (applied in the field 
L’ = R(B) = R(e) = L), y cannot be zero. 

As we saw in the proof of Theorem 3-20, it follows from the 
representation 


w= 4 + Lea + 238 
of an arbitrary integer w of L that 
a(w + pw’ + p*w’’) = 3abz3. 


Taking w = ?, we see that if the coefficient of 8 in 7” is zero, it must 
be that 


(? + yo + “Vy (? + ypa + rey 
sf eee ee 


2 
ae (Ete ey — 0. (45) 


Suppose first that p = 1(mod3). Then since p? = p, (45) can 
be written in the form 


3-9] THE DELAUNAY-NAGELL THEOREM 119 


(# + yp'a + = (= + ypa + =)’ (? + ya + =) 
3 ee 3 ca 3 | 


Since =p is odd, the left side is divisible by 
sp t+yp’a+ 28 xp? +ypat2s —xr — ya + 228 

3 3 3 
this number is an integer, and since it divides y”, it is a unit. Conse- 
quently, 

—z> — ab*y® + 8a7b2* — Gabryz = +27. 
Since 7 is a positive unit, also 

a> + ab’y® + a7bz* — 3abryz = 27, 
and by addition, 
Qa7bz* — 9abryz = 0 or 54. 

In the first case az? — xy must be zero. But this number is the 
coefficient of a in 3/n, and as we saw at the end of Section 3-7, it is 
not zero, since 1/n > 1. 

In the second case we have 

abz(az” — ry) = 6. 
But then z, y, 2 are not all divisible by 3, so that L is of the second 
kind. This is impossible, since if ab|6 then a® — b? # 0 (mod 9). 
The case in which p = 2 (mod 8) proceeds similarly. Equation 
(45) can be written in the form 
(= + yo + =e’ (2 + yo + vey (? + ya + #y 
ee) ef C=, 
3 3 3 
from which it follows that the number 


tp? + ya+ 28 . rp +ya+zp°B —2x + 2ya — 26 
3. FO == 


isa unit. As before, 
Qab?y*? — Yabryz = 0 or 54. 


Since by” — zz is the coefficient of 6 in 3/n, it is not zero. But it is 
also impossible that (by? — zz)|6 and ab|6, since then L must be of 
both the first and second kinds. The proof is complete. 


120 APPLICATIONS TO RATIONAL NUMBER THEORY [cHap. 3 


Theorems 3-25, 3-26, and 3-27 show that any nonzero solution of 
2° + dy? = 1 must correspond either to the fundamental unit of L, 
or to its square. Not both of these numbers can lead to solutions, by 
Theorem 3-22 with n = 1. This completes the proof of Theorem 3-23. 


REFERENCES 


Section 3-5 

For a complete exposition of what is known concerning Fermat’s con- 
jecture, see H. §. Vandiver, ‘‘Fermat’s last theorem: the history and the 
nature of the results concerning it,’”’ American Mathematical Monthly 53, 
555-578 (1946). The result of Lehmer, Lehmer, and Vandiver was an- 
nounced in Proceedings of the National Academy of Sciences 40, 25-33 
(1954). Landau gives a proof of Kummer’s lemma; see his Vorlesungen 
uber Zahlentheorre, vol. 3, Leipzig: S. Hirzel Verlag, 1927. 


Section 3-6 


The equation y* = x* + k was the subject of L. J. Mordell’s inaugural 
address, A Chapter in the Theory of Numbers, New York: Cambridge 
University Press, 1947. Also see Dickson’s History of the Theory of Num- 
bers, Washington: Carnegie Institution of Washington, 1919; reprinted, 
Chelsea Publishing Company, New York, 1950; vol. 2, pp. 531-539. 


Section 3-7 


Dedekind’s fundamental paper on pure cubic fields is in Journal fir dre 
Reine und Angewandte Mathematik (Berlin) 121, 40-123 (1899). 


Sections 3-8, 3-9 


We have followed the treatment by Nagell, Journal des Mathématiques 
Pures et Appliquées (Paris) 4, 209-270 (1925). Delaunay (Comptes Rendus 
Hebdomadaires des Séances de l’ Académie des Sciences (Paris) 171, 336 
(1920) and 172, 434 (1921) ) announced that equation (28) has at most five 
solutions in case c = 1. His work on (29) was announced in Compies 
Rendus 162, 150-151 (1916). 


CHAPTER 4 


THE THUE-SIEGEL-ROTH THEOREM 


4-1 Introduction. It is shown in introductory texts m number 
theory* that if a is a quadratic irrationality (that is, an algebraic 
number of degree two), then there is a positive constant c such that 
p c 
C= =| > = 
: ¢ 
for every pair of rational integers p,q with q > 0. The idea used 
there suffices to prove the following generalization, which is due to 
J. Liouville. 


THroreM 4-1. If a is an algebraic number of degree n > 2, then 
there exists a positive constant c such that 


«-2[>< (1) 
q q 


for every pair of rational integers p, q with q > 0. 
Proof: Let a be a zero of the irreducible polynomial 
f(z) = agx™” + - +> + ay, ao > 0, 


with coefficients in Z, and let a, = a, ag, ..., an be its conjugates, 
so that 
F(x) = ao(x — a)(@ — ag) --- (& — ay). 


Then the number 


qf (‘) = agp” + ayp""g +--+ + ang” 


is a rational integer different from zero, and it therefore has absolute 


*See for example, Volume I, Section 8-4. In Section 8-5 Hurwitz’ 
theorem is stated and proved, and in Chapter 9 the problem of approxi- 
mating real numbers by rationals is considered; all this material is assumed 
in the present section. 


121 


122 THE THUE-SIEGEL-ROTH THEOREM [cuar. 4 


value at least 1. Hence 


Put 
p= [a] = max (lal, ..., lanl). 


We consider two cases, according as |p/gq| is greater than 28 or not. 
In the first case we have the trivial lower bound 


a—Pi>g>. 
q 
In the second case the inequality |a, — p/gq| < 38 holds for 
k = 2,...,n, and, by the inequality of the preceding paragraph, 


| p| 1 

OSs | 2 a aaynal 
| a| agg” (38)"* 
Thus, the theorem holds with 


; i 
ou (2 a a) | 


Liouville used this theorem to show the existence of nonalgebraic 
numbers; this will be discussed in detail in the next chapter. At the 
moment, let us consider a hypothetical improvement of Theorem 
4-1, in which the inequality (1) is replaced by 


ae 

q 
where vy is any number smaller than n. A. Thue noticed that if 
such a theorem could be proved, it would have the important conse- 
quence that the Diophantine equation 


 Saeee:, (2) 


q's () = app” + ap” "q¢+---+ aq" =A (3) 


can have only finitely many solutions for any fixed rational integer A 
different from zero, if f(x) has distinct zeros. To see this, let the zeros 
of f(z) again be aj = a,..., a», and put 


7 = min (la; — aj). 
tj 


4-1] INTRODUCTION 123 


Suppose that (3) has infinitely many solutions p,q. Then there 
must be at least one a;, which by suitable naming we can take to be 
a, which is a limit point of the numbers p/g, since otherwise the 


quantity 
re VY n = Pp 
q k=1 \Q 


is certainly not bounded as gq increases indefinitely. There 
must therefore be infinitely many solutions of (3) for which 
la — p/q| < y/2. But for all such solutions, 


b]- —4___ <4 ,.4 
agg” II |* 
k=2 


— — ay 


and this is at variance with (2) if v is a constant smaller than n and 
q is sufficiently large. 
Thue showed that (2) holds with 


n 
= 1. 
y a 


Later C. L. Siegel improved Thue’s result, showing that (2) holds 


with 
> min (= +s) 
Vv S }) 
l1<s<n-l s+1 
s€Z 


and in particular with »y = 2V/n. In 1947 F. J. Dyson made the 


further improvement v > V2n, and finally in 1955 K. F. Roth 
proved that (2) holds with vy = 2+, for each « > 0, for all but a 
finite number of fractions p/g. This is the best theorem possible if 
y is to be independent of g, since Hurwitz’ theorem shows that the 
corresponding statement is false for every irrational algebraic number, 
for y = 2 and suitable c. Roth’s work is similar in some respects to a 
simplification of Dyson’s proof, published by T. Schneider in 1948. 
In addition to the problem of sharpening Theorem 4—1 by decreas- 
ing the exponent of g, we may also consider the question of extend- 
ing the methods so as to analyze the approximability of an algebraic 
number by other algebraic numbers. This is not mere generalization 
for its own sake: as we saw in the preceding chapter, it is natural to 


124 THE THUE-SIEGEL-ROTH THEOREM [cHaP. 4 


consider the solvability of x? + y? = z? in a larger set of integers 
than Z, and the same is true of many other Diophantine equations. 
But if the variables in an equation range over the integers of an 
algebraic number field, then to the extent that approximation 
theorems are useful at all they must be formulated in terms of alge- 
braic rather than rational numbers. 

While Siegel gave many algebraic variants of his basic result, Roth 
presented a detailed proof only in the rational case. In this chapter 
we give a complete proof of a useful algebraic version of Roth’s 
theorem. Unfortunately, the proof is complicated; the student 
might profit by first examining Schneider’s work mentioned above. 

We shall proceed as follows. In the next three sections we shall 
make some definitions, and obtain some preliminary results, which 
are needed for the proof of the main theorem: in Section 4-2 some 
properties of polynomials will be treated, in Section 4-3 the concept 
of the generalized Wronskian will be introduced, and in Section 4-4 
the index of a polynomial will be defined and discussed. Then we 
shall proceed to prove, in Sections 4-5 and 4-6, several lemmas on 
which the proof of the main theorem depends, and finally, in Section 
4-7, we shall state and prove the Thue-Siegel-Roth theorem itself. 
In the remainder of the chapter, some applications of the theorem 
will be taken up. 


4-2 Polynomials. If P(z) is a polynomial with arbitrary complex 
coefficients, we denote by ||P|| the maximum of the absolute values of 
its coefficients. If a is an algebraic number and P(z) = 0 is its de- 
fining equation, so that P is irreducible and has relatively prime 
coefficients in Z, we define the height H (a) to be ||P||. Finally, if P has 
algebraic coefficients, we designate by [A] the maximum of the 
absolute values of their conjugates. Clearly ||P|| = [Al if P has 
coefficients in Z, and for a nonzero constant polynomial P(z) = a the 
new definition of a] agrees with the old one. 

Except when a polynomial is written as a determinant, it will be 
supposed that no two terms have the same exponents on the variable, 
or sets of exponents on the variables. 


THEOREM 4-2. Leél,d1,..., An be complex numbers, and put 


h 
L(z) =1 II (@ —»:). 
k=1 


4-2] POLYNOMIALS 125 
h 
Then a) IE + lad) < 6MLLII 


Proof: There is no loss in generality in supposing that J = 1, since 

a change in I affects in the same way the two sides of the inequality to 

be proved. Let \y,..., Az be those of the \’s such that |A;| < 2. If 
t 


f(z) = IL ( — Xx), then there is a complex number zp with |zo| = 1 
k=1 


for which |f(z9)| > 1. To see this, let « be a (¢ + 1)th root of unity, 
and suppose that 


t 
f(z) = x Mr2", i= 1. 


Then 
t t t t t 
De e’f (e”) —_ > é” > bye” as Ss Lr > etl). 
y=0 vy=0 r=0 r=0 y=0 
But 
; 0 if (¢+1)4(7r7 +1) 
v(r+1) _ ? 
a / 41 #£G4+D1+D), (4) 


and since r < ¢, (¢ + 1)|(r + 1) if and only if r = ¢@ Hence 
t 


Du fle’) = t+ 1m =t4+1, 


y=0 


so that one of the ¢ + 1 numbers |f(e’)| is at least 1. Thus 


I (1 + [Agl) < (1 + 2)’ = 3’ < 3! I (Zo — Ax) |. (5) 


Ift<h, then fork =¢+1,...,h we have |A;| > 2 and 


1a a, Pel el 2 2 
——H << — =1+ <1+z=— =3, 
el lal ial i ee 

so that 


h h 
II] (+ asl) <3" *| ID (@o — dx) 
k=t+1 k=t+1 


Combining this with (5), we have 
< 3"||LII(lzol” + --- +1) 


h h 
II (1 + |axl) < 3*| IT (@o — Ax) 
k=1 k=1 


= 3'(h + 1)||LI| < 6"||LI|. 


126 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 


THEOREM 4-3. Suppose that f(z) and g(z) are polynomials with 
complex coefficients, of degrees n and m respectively. Suppose further 
that the coefficient of z” in g(z) has absolute value at least 1. Then 


lifll < 6" "I [Fall. 
Proof: Let 
f(z) = ag — Ar) --- @ — An), 
g(z) = bo (z ‘=~ An+1) ey (z ac Antm)- 
Then 
Wfll < lao TL @ + Peal) || < laobol - EE G+ Pel) 


n+m 
< |agbo| iH (1 + Az), 


and the desired result follows from Theorem 4-2. 


THreorem 4-4. If f(z) ts an arbitrary polynomial of degree n, 
with real coefficients, then 


AI" << (mn + IF" Il. (6) 


Proof: Let f(z) = do + az +++: + an2”, and let ||f|| = a. The 
theorem is certainly true if either |aj)| = a or |a,| = a, since the first 
and last coefficients in f”(z) are the mth powers of ao and az, respec- 
tively, so that in this case ||f”|| > ||f|[". If we put 


l 
7" (z) —— 2"f (=) ) 
then clearly 


WF = UAL and G*)™" I = WO), 


so that we can suppose, with no loss in generality, that the numerically 
largest of all the coefficients in f(z) is a;, where $n <t <n. 


Put g(z,0) = f(z) — ae®z”, 


and let a = a(@) be the numerically largest of the zeros of g(z, 6) for 
each 6. The inequality (6) holds if, for some 0, |a(6)| > 1. For 


If” (a)| = |ae¥a"|" = a™la|™”, 
while for |a| > 1, 
Ff" (a)| < [FG A lel +--+ + fal”) < [|F"1] Gan + 1)|a\"", 


4-2] POLYNOMIALS 
so that 


127 


a” = If ||" < |[F"|| (mn + 1). 
We know that |a,| <a. Hence if f(1) > a, then 
g(1,0) >a—-a=0O, g(o,0) = —o, 
and 1 < |a(0)| < «. Similiarly, if f(1) < —a, then 
gl,r)<—ata=0, g(o,r) = », 


and 1 < |a(r)| < ©. This proves the theorem unless | [Vi<a 
which we henceforth assume. ii 


Now put z = e*, so that 
g(e”, 9) = f(e”) — get Otne) 
If we find a go such that |f(e*?°)| = a, then 6) can be determined so 
that g(e’?, 6) = 0; this gives |a(6))| > 1 and proves the theorem. 


Since |f(e)| is a continuous function of y, and since |f(1)] < a, it 
suffices to prove the existence of a yp such that |f(e”°)| > a. 


Let « be a primitive (¢ + 1)th root of unity, where |a,;| = @ and 
gn <t<n. Then 


t t n n t 
Vef@) = Le’ LV ae’ = Yay’, 
=0 y=0 k=0 Foo yes 


Since k < n and ¢ > 3n, we have that (¢ + 1)|(% + 1) if and only if 
= t. Hence, by (4), 


2 eT) ~ ar(t 1 1), 
so that for some », 
lerf(er)| = lf(e)| > la,| = a. 


The proof is complete. 


TaroreM 4-5. If fi(z),.. 
coefficients, then 


i , < zat (1 + deg j,) Tr |f.] 


Proof: There is no loss in generality in supposing that 


deg fi 2 degfp >...> deg fi. 


-,Je(z) are polynomials with algebraic 


128 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 


The product fife is a polynomial each of whose coefficients is a sum of 
products of a coefficient of f; and a coefficient of fo, the number of 
summands being at most 1 + deg f,. Hence 


[fife] = (1 + deg fo) [fi] fal. 


Similarly, 


[fifofs| <(1+deg fs) [fifel [fs] < (1 + deg fs) (1-+deg fo) [fil [fel [fs], 


and so on. 


TuHeoreM 4-6. Let p and r be positive integers, with 1 <r < p. 
Suppose that F(21,...,2p), G(@1,..., Zr), and H (2,44, ..., 2p) are 
polynomials with coefficients in an algebraic number field K, those of 
F being integers, and suppose that 


F252 ep) = GC Cio sy Se) Crtagccis ep): 


Then tf y 1s any coefficient in F, there ts a factorization y = aB in K 
such that the coefficients in aH and BG are integers in K. 


Proof: Let the coefficients in G be a;,..., a 5, and those in H be 
61, ..., 8:, 2 some order. Then, since the variables in G and H are 
disjoint, the coefficients in F are simply the products a,8;. Since the 
coefficients in F are integers, all the products a;6;,..., a8; are 
integers, as are all the products Bja;,..., Bjas. But these two sets of 
numbers are just the coefficients in a,H and BG. 


4~3 Generalized Wronskians. Polynomials fo(2,...,2p),..-, 
fit, ...,%p) with coefficients in an algebraic number field K are 
said to be linearly dependent if some linear combination of them, with 
constant coefficients in K which are not all zero, vanishes identically, 
and are otherwise said to be independent. In the case of a single 
independent variable, it is well known that the question of independ- 
ence of a set of functions can sometimes be settled by reference to 


their Wronskian. For our purposes it is convenient to define this as 
the determinant 


We) = det (= Fs); py =0,1,...,0-1, 


which differs from the usual definition only in the presence of the 
nonzero constant factor 


a, vere 
Olts<. G— 1)! 


4-3] GENERALIZED WRONSKIANS 129 


The exact relation of the behavior of the Wronskian to independence, 
as applied to polynomials, is indicated in the first part of the next 
theorem. 

For functions of several variables, the situation is not quite so 
simple, since there are then several partial derivatives to consider. 
We proceed as follows. Let Ao, Ai,..., Ap,..., Ars be differential 
operators of the form 


arnt me 
nics *Jp! \Oe 02p 


such that the order 7; +---+ J, of A, does not exceed up, for 
0<y»<l-—1. Then the function 


Acfo Aofi Aofi-1 

Aifo Aifi Aifi-n 
G(z,...,Zp) = : : : 

Arifo Amifi ... Arafiey 


is called a generalized Wronskian of fo,..-,fi-1. Except in the trivial 
case p = | = 1, there are several A,’s for each uw, and hence more 
than one generalized Wronskian. In the case of functions of one 
variable, the ordinary Wronskian is that generalized Wronskian for 
which the order of A, is exactly », forO < yw» <1 —1. 


THEOREM 4-7. (a) If fo,...,fi—1 are | polynomials over K in the 
single variable z, whose Wronskian W(z) vanishes identically, then 
they are dependent over K. 

(b) Lf fo,...,fi-1 are 1 polynomials over K in the variables 
Z1,..-,2p, for which every generalized Wronshian Gi(z,, ..., Zp) 
vanishes identically, then they are dependent over K. 


Proof: (a) The proof in this case is by induction. If] = 1, then 
W (ze) = fo(z), and the truth of the theorem is obvious. 

Take 1 > 1, and suppose that the theorem is true for every set of 
1— 1 polynomials, fo, fi,...,fi-2, over K; suppose also that the 
Wronskian W, of fo,...,f1-1 vanishes identically. If fo,..., fie 
are dependent, so are fo,...,fi-1, and the assertion is proved. 
Suppose then that fo,..., fi_2 are independent, so that their Wron- 
skian Wj_; is not identically zero. Now W7_,, being a polynomial, 
has only finitely many zeros; let J be an interval in which it does not 
vanish, and take z in J. For such z, the system of equations 


130 THE THUE-SIEGEL-ROTH THEOREM [cHaP. 4 
1-2 
zee (2) yx = fra” (2), j = 0, 1, a [= 2, (7) 


can be solved for the y’s as rational functions of z. But then, by sub- 
tracting appropriate multiples of each column of W, from its last 
column, we obtain 


O=1!---@-—1!W, 


foe) fae) 0 
fol) fi’ @) 0 


1-2 
foo (2) APYP® ... FFP) — 2, Se? (2) yx 


= i!---(- »1( MP) - < fo ur) Wiv 
so that also 
E fi Oe = APO) 8) 
Differentiating (7) gives 
LAY Ot EA? Owl =fAP@, 7=0,...,1-2, 


and comparison of this forj = 1 — 2 with (8), and forj = 0,...,1—3 
with (7), shows that 


1-2 
Lh @y’ =0, g=,...,l-2. 
k=0 

Since W;_1 ¥ 0, it must be that 


Yo =-:-=y'r2 =), 


so that the y’s are constants, say y, = cz, and they are clearly in K. 
But then the polynomial 


1-2 
D Crfe(z) — frsr(z) 
k=0 


vanishes throughout J, and therefore identically, so that the / poly- 
nomials fo, fi, ..- , fr-_1 are dependent. 

(b) This case is proved by contradiction. Suppose that the 1 
polynomials fo(z,...,2Zp),---,fti(@1,...,2p) are independent, 


4-3] GENERALIZED WRONSKIANS 131 


and suppose further that for each », f, is of degree less than k in each 
of its arguments, so that we can write 


k-1 k-1 
fr, eae Zp) = = : * dag Pees, os, ery ky)ey™ ae Zp*?, 
O<vSI-1 


Then the polynomials f, (¢, é*, #”, .. . , oe are linearly independent. 
For otherwise there would be an identity in ¢ of the form 


k—1 — 
> Cy > Secs ~ b, (ky, ere ky) thithakt «+ -tkpk?— an 0, 
v=0 k=0 ky=0 


or 


k-1 k-1 /i-l1 
~eee Dd (= cyb, (ky, ..., k»)) thitkekt---+kpk?? _ 


k=0 k,=0 / 


and it would follow from the uniqueness of the representation of an 
integer to the base k that for each set of exponents kj, ... , kp, 
1-1 


x Cyb, (ki, ..., kp) = 0, 
whence 


i—1 
X eof (ey --- 5 2p) = 0, 


contrary to assumption. 
We know therefore that the Wronskian 


1 (da\" k ket 
W(t) = det = a fr(é, ,...,8 ))> pvy=O0,...,b—1, 


does not vanish — . a standard differentiation formula, 


atk 
(t,...,08? “) at 


d 1 
hai kP— 


and it follows easily by induction on » that an operator identity 
d\# 
re = g1(t)AM +--+ + ora 
holds, where A“)... , A® are differential operators of orders not 


exceeding wu, r depends only on yu and p, and ¢,...,¢, are poly- 
nomials with rational coefficients. Using this in the above expression 


132 THE THUE-SIEGEL-ROTH THEOREM (cHaPp. 4 


for W(t), and writing the resulting determinant as a sum of other 
determinants, an expression for W(¢) of the form 


Wt) = WOE...) +--+ ve OG, ..., 07) 
results, in which ¥,...,%; are polynomials and G,,...,G, are 
generalized Wronskians of f1,..-.,fi-1. Since W(¢) does not vanish 
identically, there is an 7 for which G;,(é, ..., ger) is not identically 
zero, and a fortiori G;(z1, . . - , Zp) 1s not identically zero. 

THEOREM 4-8. Let R(z21,...,2p) be a polynomial in p => 2 vari- 
ables, with integral coefficients in K such that 
0<IRI<B. 


Let R be of degree at most r; in 2;, forj = 1,...,p. Then there zs 
an lin Z with 


Ll<l<r,4+1, (9) 
there 1s an integer B in K, and there are differential operators 
Ao,---, Ary on the variables 2,,...,2p-3, of orders at most 


0,...,2—1, respectively, such that of 


re) v 
Fle. +44) = edet(a, > (2) R); u,vy=0,...,2—1, (10) 
V! 025 
then 


(a) F has integral coefficients in K and is not identically zero; 
(b) a decomposition 


F (2, See Zp) = U (a1, Sees Zp—1) V (Zp) (11) 
holds, where U and V have integral coefficients in K, U ts of degree 
at most Ir; in z; forj = 1,...,p—1, and V 1s of degree at most 


lry in Zp; 
(c) the following bound holds: 
F< {Gr $1) + (rp $1) PRBVPO HE Hye Ber, 


Proof: Write R as a polynomial in z,: 
Tp 
R(@,...,%p) = a Dn (21s tk Seca eo 


The polynomials S, need not be independent; let y,(z1, ..., 2p-1); 
fory = 0,...,12— 1, be a maximal set of independent polynomials 


4-3] GENERALIZED WRONSKIANS 133 


among the S,, so that 1 <1<,r,+1. Then there are constants 
B,, in K such that for x = 0,..., 75 


? 


1-1 
S, (21, eas Zp—1) = mm Bry (21, ede. Zp—1)- (12) 
If we put 
Tp 

$y (Zp) ra ay Bien v= 0, cee y l —< 1, 

then 
1-1 

Reta; ++ 5%) = L voles, +5 2-1) O(n): (13) 
and ¢o,..-,¢1-1 are independent. For if 5o,..., 5:1 are constants 
such that 


Soyo (Zp) + °°: + brigi1(Zp) = 0, 


the coefficient of each power of z, must be zero, so that 


8o8ox +--- + 6-181. = 0 (14) 
for x = 0,...,7rp. For fixed v9 with 0 < » <1 — 1, choose x9 so 
that S,,(21,...,2p-1) = Wy (Z1,.-.,2p-1); this is possible since 


the y’s are a subset of the S’s. Then (12) shows that 


= ( ify = VO; 
Box 7 0 if y x Vo. 
Choosing x = xq in (14), we obtain 6,,= 0. Since » is arbitrary, 
every 6; = 0. 

Let W(z,) be the Wronskian of go, ..., ¢7-1; it 1s a polynomial 
with coefficients in K, and it does not vanish identically. Let 
G(z1,...,%p-1) be some generalized Wronskian of yYo,..., Yi-1 
which is not identically zero. Then 


W (zp) = det (A(-) (2 )) 
: p!\dz,) °°? uy =0,...,1-1 
G (a1, eel Zp—1) = det (A,y, (21, hey Zp—1)); 


where Ao,..., Ay-1 are differential operators on 21,...,2Zp-1, of 
orders at most 0,...,/— 1 respectively. Taking the row-by-row 
product of G and W, we obtain 


Ja 1foav\’ 
GW = det (= A, ~(2) Pp (Zp) Wp(Z1,---, ‘1)) ’ 
p=0 Vv: 02p 


? 


134 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 


or GW = det (4, ~(2) R) (15) 
v! \dZp 


Since W is a determinant of order J whose elements are polynomials 
in 2, of degrees at most rp, it is clear that deg W < Ir,. Similarly, 
G is of degree at most lr; in z;, forj = 1,...,p — 1. 

In the expression (15) for GW, we can write R as the sum of 
(r, + 1)--- (rp + 1) terms of the form . 


Os,.--ep21 I whee en: 
The determinant can then be written as a sum of 
(ry +1) +++ (rp + :1))! 


new determinants, each having entries of the form 


lfjo 
iil (ii, S1...5 8p — fl... tp 
Oo y---sp Mu © i\az, ray Zp Ag, ---sy%1 Zp”, 


in which t; < s; forj =1,...,p. Here 


ies Sj at Sp < gate +8p < onto trp 
by by oo 


Thus the entries of each new determinant are such that the maxima 
of the absolute values of their conjugates do not exceed 


+ nee + 
2+ "?B, 
and hence 


GW] < (Cr +1) ++ (rp + 1)) TIA eB 


The coefficients in GW are integers in K. It follows from Theorem 
4-6 that if GB is any one of them which is not zero, there is a factori- 
zation B = 6182 in K such that 6,G = U and B2W = V have integral 
coefficients in K, and 


BGW =F = UV. 
By the bound just obtained for |GW], we have 
O< Fa -< gw < ((ry ee Dee (rp i 1))24y 1292+ tre) pal, 
4—4 The index. Let P(z:,...,2 ,) be any polynomial in p vari- 


ables which does not vanish identically. Let a,,...,a) be any 
complex numbers, and let r;,..., 7, be any positive numbers. We 


4-4] THE INDEX 135 


define the index @ of P at the point (ay,..., ap) relative to r1,...,Tp 
as follows. Expand P(a, + y,...,a@p + Yp) a8 a polynomial in 
Y1,--+,Yp, Say 


P(a ae Y1, +--+, &p + Yp) = 2. ee a, c(j1, <6 »Jp)yx?® ie Yp’?. 
A= Jp = 


Then 
¢= min (#4... 4+2), 
Tr; Tp 
the minimum being extended over all sets of non-negative integers 
ji» -++)Jp for which c(ji,...,Jp) ¥ 0, or, equivalently, for which 


a \?1 a \%p 
(2). reset 


Note that @ > 0 always, and that 6 = 0 if and only if 
P(ay,..., &p) x 0. 
Moreover, if any derived polynomial 


9 \h a \*e 
(2) ee (x) Pink Sp) 


is not identically zero, it is clear that its index at (a,,..., ap) rela- 
tive to 71, ..., 7p is at least 
fe at 
T71 Tp 


The following properties, which we list in a theorem for later refer- 
ence, are also immediate consequences of the definition. 


THEOREM 4-9. Let P(21,...,2p) and Q(2,...,2p) be poly- 

nomials, neither of which vanishes identically. Then if we consider 

indices formed at the same point (a,,..., ap) relative to the same 
numbers 11, ..., Tp, the following relations hold: 

index (P + Q) > min (index P, index Q), (16) 

index PQ = index P + index Q. (17) 


Equation (17) remains true if P ts a polynomial in 2;,..., Zp—1 only, 
and Q is a polynomial in z, only, provided that the index of P is 
taken at (a,..., @p-1) relative to r,...,1p—1, and that of Q at 
a, relative to rp. 


136 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 


Now let 71, ..., 7m be positive integers, and suppose that B > 1. 
We consider the set Rm = Rm(B;71,...,%m) of polynomials 
R(z1,...,2%m) which satisfy the following conditions: 

(a) FR has integral coefficients in K, and is not identically zero. 

(b) & is of degree at most 7; in z;, forj = 1,...,m. 

(c) [RI < B. 

Let 1,...,¢m be algebraic numbers (not necessarily in K) of 


heights H(¢,) = qi,---, H(tm) = dm. Let 6(R) denote the index 
of R(z1,...,2m) at the pomt (f1,.-., fm) relative to r1,..-,Tm- 
Our object in the present section is to obtain, under certain condi- 
tions, an upper bound for @(#) in terms of B, qi, .-- , Gm) T1) - ++ Tm: 
We therefore define 


On(B; Qi>--+)Qm;7T1,---,5 Tm) a sup 6(f), (18) 


the supremum, or least upper bound, being taken over all R in R 
and all integers {1,..., &m of heights qi, ..- , Gm, respectively. 

The double significance of 71, ... , 7m in the definition (18) should 
be noted; these numbers occur both in the definition of the index 
and in condition (b) above. 

We proceed by induction on m. In Theorem 4-10 the case m = 1 
is treated, in Theorem 4-11 there is given a recurrence relation 
between 9,1 and 9,,, and in Theorem 4-12 an explicit bound is 
obtained. 


THEOREM 4-10. 
3N(N +1 N log B 
Meus ) ee, 
log q1 r; log qi 


Proof: Let the defining polynomial of ¢; be 
x(Z1) = dozy” +--+ + dh, do ¥ 0, 
where do, ..., d, are relatively prime rational integers, so that 
xll = H(i) = qr = max (|dol, .. - , |da). 


Each polynomial R in &, has integral coefficients in K; regarding 
these coefficients as polynomials in a single primitive element, we 
can obtain other polynomials from F& by successively replacing this 
primitive element throughout by its various conjugates. Let R* 
be the product of these N polynomials. By the Symmetric Function 


4-4] THE INDEX 137 


Theorem, R* has coefficients in Z. Also, deg R* = Nr,, and by 
Theorem 4-5, 

|R*|| < GQ +71)*B”. 
By the definition of the index, R(z) is divisible by (2, — ¢,)", 
and the same is therefore true of R*(z,). Since R*(z,) has coeffi- 
cients in Z, it is divisible by x7°. One consequence of this fact is 
that hr,@ < Nr,. Also, it follows from Theorem 4-3 that 


IIx7?|] < 677", 
and, by Theorem 4-4, 
gn = |Ixll" < Gere + 1)6*||R*|| 


< (Nry + 1) tgN1BN < QNr1(N+1) 6 N71 BN 
< 1QVW+D1py 


Hence 
N(N + 1) log 12 ge IE 2 

log qu r, log qy 
and the theorem follows from the fact that log 12 < 3. 


6< 


THroreM 4-11. Let p > 2 be a positive integer, let r1,..., 7p be 
positwe integers such that 


Tj-1 


Tp > 1057}, >os', forj =2,...,p, (19) 


j 
where 0 < 6 < 1, and let qi, ..., Gp be positive integers. Then 
0,(B; Gi,-++,Qp;T1,---, Tp) < 2max ( za ? + 6°), (20) 
where the maximum 1s taken over integers | satisfying 


1<l<r,4+1, (21) 
and where 


® = 0,(M 3 Gp; rp) + Op (Mj, . ~~ Mp—ajs dri, ..., lrp-1) (22) 
and 

M = (ry + 1)??/2?717] |? B21 (23) 

Proof: Let R(z,...,2p) be any polynomial of the class 

R,(B;171,..-,Tp) and let {1,..., ¢» be algebraic numbers of heights 


qi,-- +») Qp respectively. Then & satisfies the hypotheses of Theorem 
4-8, so that there are numbers / and 8 and a polynomial F(z), ... , 2p) 


138 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 


having the properties listed there. By Theorem 4-8, 
[FL < (Gr +1) +++ tp + 1) 2? re PB 
and hence 
va < (ry + 1)?72Q271P!] 12 B2t pa M, 
since r3 > Tg > +--+ > Tp by (19). From the factorization 
F(a, eecees Zp) = U (2, ee ey Zp—1) V (Zp) 


and the fact that the arguments of U and V are disjoint, it follows 
that also 


TUl<M, IVl<M. 
The polynomial U (21, ...,2Zp)-1) has degree at most Ir; in z;, for 
}=1,...,p—1. It is therefore an element of the class 
Ry—1(M; Ir, ... , rps). 
Hence, its index at (1, ..., p-1) relative to Ir}, ..., Irp_1 is at most 
0,-1(M; Gi,-++») Qp—1;5 lr, ws Sy Irp_1). 
It follows from the definition of the index that the index of U at that 
point relative to 71, ..., 7p—1 1S at most 
l6,-1(M; Qi,-++,Qp—1) lry, ae lrp1). 
Similarly, V (zp) is an element of the class R,(M; Ir,), and its index 
at [> relative to r, is at most 
16; (M; gp; Ira). 


By the last sentence of Theorem 4-9, the index of F = UV at 


(f1,..-, fp) relative to r1,..., 7p is the sum of the indices of U and 
V, so that 


index F < 14, (24) 


where ® is defined in (22). 

We now deduce from the determinantal representation of F in 
equation (10) a lower bound for the index of F in terms of the index 
6 of R. Consider first any differential operator of the form 


1 a\" Q@ \'4 
A — ay aaa eee —— ee @ ) } 
ty! + tpy! \Oz, O2p—1 


w=at---+i,4<1-1. 


of order 


4-4] THE INDEX 139 


1/fa 
a= (2 -) R(2,.-.., 2p) 


If the polynomial 


does not vanish identically, its index at (f1,...,£p) relative to 
T1,...,1p 1s at least 

} w 

(pee ees ee Spe ae 

Now 

l—1 

ve< <2 <6, 
Tp—l Tp-1 Tp-l 


by the inequalities (21) and (19). Hence, since the index is non- 
negative, it must be at least 


max (0,0 — 5 ~* > max (0, ~~) — 6. 
Tp Tp 


If we expand the determinant on the right side of (10), we obtain 
for F a sum of I! terms, a typical term being 


1 @ 1 g \i 
+B (Ap FR) (4. 1! aZp R) Sos (40 (I _ 1) ! (=) R) ’ 


where Ay), ..., Ay, are differential operators on 2), ..., Zp-1 whose 
orders are at get 1—1. By Theorem 4-9, the index of ach a term, 
if it does not vanish identically, is at least 


1-1 
> max (0, 6 — ) — 16. 
»y=0 Tp 


Since F is a sum of such terms, it follows from Theorem 4-9 again that 
index F > pa max (0 6 — “) - l6. 


We may suppose that ér, > 10, since otherwise 
6<10r, 1 <8 < 28 


and the desired inequality for 6 then holds. Under this supposition, 
[orp]? > 20°r,?/3. Hence if 6r, < 1, we have 


140 THE THUE-SIEGEL-ROTH THEOREM [cHAP. 4 


l—1 [6rp) 
© max (0, g-7)ar ty (6rp — v) 


=0 Pp 


while if 6r, > 1, then 


i—1 y l—1 1 
= max (0,9 ~ ” -E(0-2)>50 
v=0 Tp »y=0 2 
Hence 


index F > min (316, $r,6”) — 1s. (25) 
Combining (24) and (25), we obtain 
min (316, 37,07) < l(@ + 8). 


Thus either 6 < 2(@ + 5), in which case 6 satisfies the desired in- 
equality, or 


arp? <U(+ 8) < (rp +1)(@ + 4). 
Since r, + 1 < 4r,/3 by (19), this gives 
0< 2 +8)? < 2 + &), 
and the proof is complete. 
THEOREM 4-12. Let m be a positive integer, and suppose that 


1 
$< ———_; 2 
0<0< TomN +1? eo) 
Let r1,...,Tm be positive integers such that 
tm >, #554, forj=2...,m. (27) 
rj 


Let qi, ---5Qm be positive integers such that 


log q, > 25-' m(2m + 1), (28) 
r; log qj; = 11 log qi, for 7 = 2,...,™, (29) 
log qi: > 35 (N(N + 1). (30) 


Then 
Om(gr; Qi, - =) Umi Ti) «++, Tm) < 105%”, (31) 


4-4] THE INDEX 141 


Proof: The proof is by induction on m. For m = 1, we apply 


Theorem 4-10, together with the inequalities (80) and (26), and 
obtain 


3N(N +1) . N log (q1°") 
log q1 ry log m 


which is the desired inequality. 

Now suppose that p > 2 is an integer, and that the theorem holds 
when m= p-—1. When m= pp, the hypotheses of the present 
theorem are more stringent than those of Theorem 4-11, so that the 
latter is applicable here. We must estimate M and ®. 

We have 


M= (ry + 1)2P!g2r1Pl] 12 B2t < ((ry a 1)??Q?717 29,2871). 
Since l <r, +1 <7, +1 < 2”, it follows that 
M < (QZ4Pt2) 119, 2671)! < (e4Pt2) rig, 2871)! 


01 (91°; gis 71) < < (N + 1)8 < 108, 


By (28) with m = p, we have 4p + 2 < dp log qi, so that 
M < ql, 


where 6, = 26(1 + p*). (32) 
Thus 01(M; qpj lp) < 91 (g"!""; gp; rp) (33) 
and 


Op-1(M 3 q1,.--, Mp1; Uri, ---, ps) 
< Op-1 (qy°t’ ; Qty +--+) Qp-a3lm1,.--, Urp_y). (34) 


Moreover, (32), together with the imequality (26) with m = p, 
implies that 


l+p° 1 
PN +IP  @-prtweiy 
In particular, (V + 1)6, < 53. 
It follows from (30), and the fact that g, > qi, that 
log-‘gp > 35 1N(N + 1). 
Hence by Theorem 4-10, the right side of (83) does not exceed 
N6ylr; log qi 
lry log dp 
here we have used (29). 


61 < 


<5+ Ns, < (N+ 1)5; < 6,2; 


142 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 


To estimate the right side of (34), we use the induction hypothesis, 
that the theorem holds when m = p— 1. The conditions of the 
theorem are satisfied for m = p — 1, if we replace 6 by 6, and 
T1,---,Tp-1 by Iny,...,lrp-1; since 6, > 6, this is obvious for all 
the relations but (26), which has already been verified in (35). It 
follows that 


Op (qi; gu, - 5 Qp-a3 Ur, ---, pr) < 10718, 
Hence, since 5; < 46, the two results just proved imply that 
& < 267 + 2(107-18)” ") < 3(10? 18” *). 
Finally, (20) gives 
On(q 3 M1,.--, Qp3T1,-++5Tp) 
< 2{3 (102-16)? ") + 38102? Ps? 4. shy 
1 


3 32 ) e 
< 2(— + —, + — ) 1075”? < 1975%)?, 
(= + 102 as 10? 


4-5 A combinatorial lemma 


THEroreM 4-13. If ri,...,1m are any positive integers, and \ > 0, 
then the number Am(A) of sets of integers j1, .-.,Jm which satisfy the 
enequalities 

0O<j7 <1, oy 0 < jm S Tm; 


ji jm U1 

T} " Tig” 2D 
does not exceed 

Qm*x "(ry + 1)--- (tm +1). 


Proof: We proceed by induction on m. The theorem holds for 
m = |, since the number of integers 7; such that 


0<n7n <1, na <30-A)n 


is at most r; + 1, and isO if A > 1. 

Now suppose m > 1. The result is trivial if \ < 2mi, since then 
the conditions on the individual 7’s give an improvement of the desired 
upper bound. Hence we may suppose that \ > Qm?. If we fix jm, 
we must count the sets of integers ji, . . . , Jm—1 Such that 


4-5] A COMBINATORIAL LEMMA 143 


0O<7 <1, - 9 0 < Jm—1 S Tm-1, 
] lied i 2m 
Ay. 4 etc 2(m—y 7h). 
T} oe ne? Tm 
Putting 
2; 
Hah = (m—1)—X, 
Tm 


or, what is the same thing, 


Qin 
Ve Gy ae 1+, 


we see that 


Am(d) = S Ama Qn). 


In = 
By the induction hypothesis, 
Tm 2\—1 
Am(X) < 2(m — 18 +1) +++ ma FY U(A-14+ 4) , 
3 =0 m 


and it suffices to prove that 


E(x-14 2)" <xtem— 1am +) 


j=0 


for all positive integers r and m, if \ > 2m. 
If r is even, we put j = $r + k and obtain the sum 


3 2k\— ui 2k\—t 2k\-1 
ET) OPS E ee Rg) | 
k=—4r r k=1 r r 
dr 2\—1 
=rA7T4+ > 2 (13 -=) 
k=1 
ir 
SX*+ aD OF -1)77 
k=1 


dr 
=nNt+207 Y - x?) 
k=1 


= aia (aed © ¢ ea ame ae 


Since 1 — 4? > 1 — m1/4 > (1 — m—)}, we have the desired 
inequality. 


144 THE THUE-SIEGEL-ROTH THEOREM [cnap. 4 
If r is odd, we put j = (r — 1)/2 + k and obtain the sum 


4077-41) —1 
: 2k —1 
e (1+ 2=4) 
k=—2(r—1) r 
art) 2k — 1\1 2k — 1\7 
k=l r r 
4(r +1) I — 1 2\—1 
af (e-@50) 


<aQ? —1)%¢r + 1), 


and the result is as before. 


4—6 The approximation polynomial. Let a be an algebraic integer 
of degree n > 2 over K, so that a@ is a zero of a polynomial which has 
integral coefficients in K and which cannot be factored into a product 
of such polynomials of positive degrees. Let L = K(a) be the field 
obtained by adjoining ato K. Finally, let w,..., wy be an integral 
basis for K, and put 


lal = db, max (lal,..., lwyl) = be. (36) 


In the remainder of the proof we shall be concerned with a single 
set of values of m, 6, q1, $1, ---, Gm, &m)11,--- 1m, Which will be 
chosen later in the order just specified. The choice will be made so 
as to satisfy the following conditions: 


0<b<m12"(N+1)%, (37) 

1073” + 2(1 + 35)nm? < 5? (38) 

tm > 1087, oa forj=2,...,m, (39) 
5) 

& log gq: > 2m + 1+ mlog (b, + 1) + 4bN, (40) 

rylogqg; >rilogq, forj = 2,...,m, (41) 

log gq: > 35N(N + 1). (42) 


Notice that these conditions imply those of Theorem 4-12, since (87) 
and (40) together imply that 6 log q; > 2m(2m + 1). 


4-6] THE APPROXIMATION POLYNOMIAL 
Define X, », 7, Bi by the equations 
h = 4(1 + 35)nm?, 
uw = 3(m—d), 
n = 10769)", 
= [a°”). 


Then (38) is equivalent to 


n < yp. 
Also, 


qi"! < By, 


since Vz <x —1 < [z] forallz > (3+ V5) /2, and 
ge” > ge" > e(2mt))r1 > o3tm > e39. 


145 


(43) 
(44) 
(45) 
(46) 


(47) 


We come now to the main lemma, which will be the only one to 
which reference is made in the eventual proof of the Thue-Siegel-Roth 


theorem. 


THEorEM 4-14. Suppose that the conditions (37) through (42) are 
satisfied, and suppose that {1,...,m are algebraic numbers of 
heights q1,..-,Qm, respectively. Then there exists a polynomial 
Q(21,.-.,2m) with integral coefficients in K and of degree at most 


rj;inz;,forj = 1,...,m, such that 


(a) the index of Q at the point (a,..., a) relative tor,,.. 


at least p — 7; 


(b) QG1,..-, 5m) #90; 
(c) for all derivatives 


a 
ran ae oe 


where 14,...,%m are non-negative integers, the inequality 
|Qit---im (21, ++, 2m)| < ByFPP(L + lai|)- ++ (LA leml)™ 


> Tm tS 


holds, and the corresponding inequality also holds if the coefficients in 


Q are replaced by their respective field conjugates. 


Proof: Let c,,...,¢n range independently over the non-negative 
rational integers not exceeding B,, and let C be the set of integers of 


K of the form 
C1@} +--:- + CNWN.- 


146 THE THUE-SIEGEL-ROTH THEOREM [cuap. 4 
The number of elements of C is (1 + B,)”, and if we put 
AQ+ri):-:- (+7) =7, 


there are 
(1 + By)*” (48) 
distinct polynomials 
‘1 Tm 
P(g, ..., 2m) = > een >», (81, -- +) Smee) + + * Smn?™ 
8) =0 8, =0 
whose coefficients y (si, ..., Sm) belong toC. For y(s1,---, 8m) inC, 


ly(si,- +; Sm) | < beBiN, (49) 


and if we put 


Piqes-jm(@1y + + + 5 2m) 


1 3 ji a \In 
-—— (2) (2) PZ cvs Ze) 
ni: : Im! 021 02m, 


TI 


= au) m—J 
— eee * eee - 1—J1 eee —sm 
RA De VSiy +» Sm) Ii we “1 em . ’ 


sm= 


then 
IP 3,30 < grit +m BIN < boN o""1B, < boNB,}*°, 
since mr, log 2 < 48?r, log gq: by (40). Now replace all of 21, ... , 2m 
by a. Since the total number of terms is at most 7, and since, by (40), 
r= (ri +1)--- (tm +1) < QI < (by + 1)" < By’, (50) 
we obtain the bound 
[Pj ---jm (Qs sang a) | < boN By trby it tem 
< boNB,t*?. 
Let 3 be a primitive element of L, so that L = R(s). Order the 
conjugates of 3 so that 31, ..., 8), are real and 3,4, and 3,4 .45 are 
complex-conjugate for vy = 1,..., p2, so that p1 + 2p2 = nN. Lett 


be a fixed one of the numbers Pj,...;,,(a,.--,%), where j,...)Jm 
satisfy the inequalities 


O<fiSrn, +++) OS Gm <1, te $< (51) 
1 m 


Then ¢ can be written as a polynomial in #, with rational coefficients, 


4-6] THE APPROXIMATION POLYNOMIAL 147 


and as such has field conjugates ¢™, »v = 1,...,nN. Hence we can 
define nN real numbers &),..., nn by the equations 


&=§, forv=1,..., 1, 
Ey + IE 4p = ae for pp +1 << pi + po. 


Collecting them in a fixed order for fixed coefficients y(s1, ..., Sm) 
and for all j1,...,jm satisfying the inequalities (51), we have a set 
of numbers which can be considered as coordinates of a point; by 
Theorem 4-13 there are 


M < 2nNm?)"r 


coordinates, and each is numerically smaller than [b.NB,!***]+1 =¢. 
Thus all the points, for the various sets of coefficients in C, lie in a 
cube of edge 2¢ in M-dimensional space. If each edge is divided into 
3t equal parts, we get (3t)” subcubes of edge 2. By (48), if 


(1+ By)" > (34), (52) 
there are more points than subcubes, and the points corresponding 
to two different polynomials P*(z,,...,2%m) and P**(z,..., 2m) lie 


in the same subcube. If we put 


PGipnsti@y) HP * Ayzexgen) =P Ciyeersgen) 
then 


Pega Qa 


~ 2 
., a) N20 <1 


for ji,---,jm aS in (51). Since P,j,...;,,(a,...,@) is an algebraic 
integer whose norm is numerically smaller than 1, it must be zero. 
Hence the index of P at the point (a,...,«@) relative to 71,..., 1m 
is at least ». Also the coefficients ¥(s1,..., Sm) in P are integers of 
K, not all zero, such that the relation (49) holds. 
To verify (52), notice that by the inequality (40), 
qo” > 4boN ? 
and hence 
By, > 4b2N, 


B,¥" > (4b.NB,)29", 
BN > (3b.N B,1+38 an 3)2N7(1438)™, 
(1+ B,)*" > Bn”. 


148 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 


We now apply Theorem 4-12, the hypotheses of which are 
satisfied, as was noted earlier. Since P belongs to the class 
Rm (qr; 11, .-~ 5 Tm), its index at (f1,..., {m) relative tory, ..., 1m 


is less than n, defined in (45). Hence P possesses some derivative 


1 a \*t a \km _ 
Ole, --52m) = (+) (2) P, 


k Kes 
4... pcg 


ry Tm 


with 


such that 
QS, eee » Sm) ~ 0. 


The index of Q at the point (a,...,a) relative to r1,..., 1m is at 
least p — n. Thus Q has the properties (a) and (b) of Theorem 4-14. 
From the relations (49) and (50), 


[Q] < git tmpNB, < 2""bNB, < boNB,}+5, 
Hence for an arbitrary derivative, 


| < grt +m NB, +8 < boN By3*26, 
Finally, 


linnim (@ty «++» 2m)| < boNBi% TT (A + las] +--+ + lest”) 
< boNB,1t?8 Il (1 + |z,|)” 
y=l1 


< By +8 TT (1 + |z,)”, 
p=l 


since b,N < B,*® by (40). The same inequality holds for the con- 
jugate polynomials, and the proof is complete. 


4~7 The Thue-Siegel-Roth theorem 


THEOREM 4-15. Let K be an algebraic number field of degree N, 
and let « be algebraic. Then for each x > 2, the inequality 


la — g| < (53) 


1 
(H(s))* 
has only finitely many solutions ¢ in K. 


4-7] THE THUE-SIEGEL-ROTH THEOREM 149 


Proof: We shall suppose that the theorem is false, so that (53) has 
infinitely many solutions, and produce a contradiction. We may 
suppose also that a is an integer. For if not there is a positive 
rational integer a such that aa is an algebraic integer, and for each 


solution ¢ of (53) we have 
g*Ntl 


ps ae cence ee 
(H(5))* ~ (H(as))* 


Hence for arbitrary « > 0, and for all solutions ¢ with H(¢) suffi- 
ciently large, 


laa — at| < 


1 
lac at| < (H (at))** ? 
and e can be chosen so small that x — « > 2. 

Finally, it suffices to prove that (53) has only finitely many solu- 
tions in primitive elements ¢ of K. For an algebraic number field 
has only finitely many subfields, and every element of K is a primitive 
element of some one of its subfields; moreover, the inequality in ques- 
tion does not depend on the degree of a over K. 

We first choose m so large that m > 4nm? and 

2m 


Sy 54 
m — 4nm? " op) 


which is possible since x > 2. For sufficiently small 5 we have 
m—41+ 35)nm? — 2n > 0, 
where 7, given by (45), becomes arbitrarily small with 6. This 


condition is the same as that of (88). We choose 6 to satisfy this 
and the inequality (87), and finally the inequality 
Qm(1 + 6) + 25N (2 + 58) 
a ee a (55) 
m — 4(1 + 36)nm? — 2n 
which is possible in view of (54). The inequality (55) is equivalent to 
bN (2 + 56 
ES Oe) (56) 
up—7 
by equations (48) and (44). 


Having chosen m and 6, we now choose a solution ¢; of (53) (a 
primitive element of K) with H({1) = q: and with q; so large as to 


150 THE THUE-SIEGEL-ROTH THEOREM [cHAP. 4 


satisfy (40) and (42). We then choose further primitive solutions 
to,---, &m of heights go, ... , Gm, such that forj = 2,...,m, 


] 2 
8G LA. (57) 
logg;1 4 
We now take r; to be any integer such that 
101 an 
(noe (58) 
5 log qy 
and define 7;, for7 = 2,...,m, by 
] ] 
1 eed) pe OFM (59) 
log qj log qj 
Then the inequality (41) is satisfied. Also, 
-] ] log Om 5 
5 a ae OF 9 qa og ¢ ra (60) 
ry log 1 ri log r, log q1 10 
by (58). The conditions (39) are satisfied, since 
> TiO MS 95-1, 
log dm 
and 
Tj—1 log q; ( 6 : = 
— > ——_[1+— > o, 
r3 log aya To 


by (59), (60), and (57). 
We know from Theorem 4-14 that there exists a polynomial 


Q(z1,...,2%m), Whose properties are listed in that theorem. Let 
61,..., nin K be zeros of irreducible polynomials of degree N with 
relatively prime coefficients in Z, the coefficients of 2” being 
ky,..., km, respectively. Then the number 

g = QS1,.--, Sm) 
is an element of K. If the field conjugates of {; are [4’, §:"",... , for 
41=1,...,m, then Ng is a sum of products of powers of the ¢," 


with integral coefficients from K, and in each such product a factor 
6; occurs to the power r; at most. In the proof of Theorem 2-2], 
it was shown that the product of k; and any set of distinct conjugates 
of ¢; is an algebraic integer. For each 7, the field conjugates of ¢; 


4-7] THE THUE-SIEGEL-ROTH THEOREM 151 


are distinct, because ¢; 1s a primitive element of K. It follows that 
k,"|---k»’™Ng is an algebraic integer, and since it is also rational 
it is a rational integer, so that 


ly + + + em?™Ny| > 1. (61) 
On the other hand, we have 
Qo; oe ey Sm) 


1 . . 
= DD Qaim (ay 2 (Sr — a) ++ Gm — @)™, 
4 =0 tm =0 
and, by part (a) of Theorem 4-14, the terms with 


4 a 
+++ +7 <p-—4 
71 r. 


™m 


all vanish. In all other terms we have 

[(S1 — a) + (Sm — o)™| < (ats => dmi™)™ 
_ { gy 22! (go72!71) #2! 72 wee (Ginn?! 71 tml tm }—11* 
< (grt!) - + + qy?mltm yr" 
< gy te —M™ 


since g;”4/"1 > g, by (41). Hence, using part (c) of Theorem 4-14, 
we have 


lol < (ry + 1) aan (Tm + 1) B13 (1 + b, )™%1gy 71 
< By1t58g, 1M 


and by using part (c) again, together with Theorem 4-2, we obtain 
la! = + Ke?™™No| < By2t88gy—1—)* B (ND (1488) 


™ N T? 
x TI {ki II (+ Df 
i= j= 


< BN O45 g,—71 x II (6% q;)%. 
Now, by (50), 
GNt:+ +4) < opNoit-** +m) < B,25% < gponn < gy oN, 


so that 
[Ky - = - Kem™@™Ne| < gyeNnG +58) +38N7 +601 +°°* +1m) —n —2)x 
™m 


< qi 71(2 +58) +mr(1 +5) —ni —)x 


152 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 
This, together with (61), implies that 


dN (2 + 56) + m(1 + 5) > (u — 0), 
or 
m(1 + 6) + 6N(2 + 58) 
a 
B— 1 
which contradicts (56). This completes the proof. 


4-8 Applications to Diophantine equations. The Thue-Siegel-Roth 
theorem will now be applied to show that a rather large variety of 
Diophantine equations have only finitely many solutions. 


THEOREM 4-16. Let U(z, y) be a binary form of degree n, without 
multiple linear factors, whose coefficients belong to an algebraic 
number field Ko of degree h. Let x and y be integral variables of Ko. 
Suppose that 

n > 2h. 


Let V(x, y) be any polynomial of total degree v < n — 2h which has 
coefficients in Ky and has no common factor with U(x, y). Then the 
equation 

U(a,y) = V(z, y) (62) 


has only finitely many solutions. 


Proof: Just as in the representation theory for bmary quadratic 
forms, it makes no difference whether we consider (62) or an equation 
obtained from it by a substitution z = az’ + by’, y = cx’ + dy’, 
where a, b, c, d are in Z, and |ad — be| = 1. If U(x, y) = ag" + - -- 
+ any”; then 


U(z,ax + y) = U(I, aja” + --- + any”, 
U(x + by, y) ars Agr” ll Ein U(6, 1)y”. 


Choose a in Kp so that U(1,a) #0, and put U(z,ar+y) = 
U,(z,y). Then choose 6 in Ko so that Ui(b,1) ¥ 0, and put 
Ui(a + by, y) = Us(z,y). Dropping the subscript, we see that 
there is no loss in generality in supposing that the coefficients of 
2” and y” in U (za, y) are different from zero, and we can write 


M 


4-8] APPLICATIONS TO DIOPHANTINE EQUATIONS 153 


where neither a nor any & is zero. By assumption, the numbers £, 
are distinct, so if we put 


cy = min (|£; — &), 
jk 


then c; > 0, and for every x and y, at least n — 1 of the factors in the 
product occurring in (63) have absolute values not less than $c}. 

Let x = 7 and y = ¢ ~ O be integers of Ko, with field conjugates 
gq), ...,9™, ¢,...,¢. Then as we saw in Theorem 2-5, 


hk 
IT (¢Pt — 9) = (QMS, 


where Q(é) is an irreducible polynomial with coefficients in Z, and 
1<f<h. Let M = max ([¢l, [nl), and name the conjugates so that 


MS | 
Q@) = IL G9 — 1). 


Then the coefficients of Q(¢) are numerically smaller then the cor- 
responding coefficients of 


hy f 
i (Mt + M), 
ie 


so that ||Q|| < (2M). A fortiori, H(n/s) < (2M). 
Now by Theorem 4-15, there are only finitely many solutions of 
the inequality 
) 
c] ~ HQ/s/Pr* 


for fixed e’ > 0. Hence for M sufficiently large, and « = ¢h, 


1 x 1 

£5] = @MyreRIT = Myr’ 
at least if the left side is not zero. This is certainly true of the solu- 
tions of (62), since U(z,y) and V(z,y) have no common factor. 
The same argument applies to the numbers 7” /s% and &, for 
l<j<shl<k<n; we see that for e>0 and M sufficiently 
large, the inequality 

(7) 

n 

f, — ro) 


je 


n 


1 : ; as 
> OMe’ ae ee ee ee ren 


holds for every solution of (62). There is no loss in generality in sup- 


154 THE THUE-SIEGEL-ROTH THEOREM [cHap. 4 


posing that M = |¢“| = |g], since (62) remains correct after replac- 
ing all quantities by their conjugates and if necessary interchanging 
xandy. Hence, for large M, 


n—I1 
n(a\e od 
On the other hand, there is a constant cy, depending only on the co- 
efficients of V, such that 


[V (n, g)| < ¢2M”. 
If we choose e < n — 2h — », then for sufficiently large M, 
[U(a, 2) > [Vq, 91. 


But a bound on M implies a bound on the integral coefficients of the 
polynomials defining 7 and ¢, so that there are only finitely many 
solutions of (62). 


CorotuarRy. If U(x, y) is a binary form of degree n > 2, with 
coefficients in Z and without repeated linear factors, and if a # Nisa 
rational integer, there are only finitely many rational integral solutions 
of the equation U(x, y) = a. In particular, the equation 


ax” + by” =c 


has only finitely many solutions in Z tf a, b, and carein Z, abc ¥ 0, 
and n > 3. 


This follows immediately from the theorem, with Ky = R, h = 1, 
and n — 2h > 0. The special case mentioned includes the higher- 
degree analog of Pell’s equation, 2” — dy” = N. 


In the above considerations, strong use was made of the homogeneity 
of U(z, y). If a Diophantine equation is not of the form specified in 
Theorem 4—16, it may still be possible to relate its solvability to that 
of one of this form. We now consider such a case. 


4-9 A special equation. It was conjectured by E. Catalan in 1842 
that 8 and 9 are the only two consecutive integers larger than 1 which 
are powers of other integers. This has never been proved; it has not 
even been shown that no three consecutive integers are powers, 
although it is trivial that no four can be, since one must be of the 


4-9] A SPECIAL EQUATION 155 


form 4k + 2. In slightly different terms, the problem is to show that 
the Diophantine equation 


yv—y=t!1 (64) 
has no solutions with w and z larger than 1, except for that men- 
tioned. Various special cases arise by fixing, or specializing in some 
other way, one or more of the variables in (64). The case we are now 
going to examine is that in which the exponents are fixed, so that we 
consider the equation 

x” —y” = 1. (65) 


Catalan’s conjecture would be proved if it could be shown that for 
each pair of integers m and n larger than 1, (65) has no positive solu- 
tions except that mentioned. Since this seems to be unfeasible, we 
consider the more modest question of whether (65) can have infinitely 
many solutions. This, at last, is a question that can be answered. 
It is a very weak consequence of the following theorem, due to 
Mahler, that (65) has only finitely many solutions if m > 2,n > 3. 


THEOREM 4-17. Suppose that m > 2, n > 3, ab #0, (x,y) =1. 
Then as max (|z|, |y|) — ©, the greatest prime factor of 
ax” + by” 

tends to infinity. 

Since x? — y* = 1 has only the obvious solutions x = +1, y =0, 
the new problem is completely solved. Unfortunately Mahler’s 
proof, which depends on a p-adic version of the Thue-Siegel theorem, 
cannot be included here. We can, however, obtain partial results of 
some interest. 

If mn is even, the fact that (65) has only finitely many solutions 
is a consequence of the next theorem, which is a special case of a 
theorem proved anonymously and published by L. J. Mordell. 


THEOREM 4-18. Let f(x) be a polynomial of degree n = 3, with 
coefficients in Z and with distinct zeros, and let a be any nonzero 
rational integer. Then the equation 


ay” = f(z) (66) 
has only finitely many solutions x, y in Z. 
Proof: Suppose that 
f(z) = ag(w — &) +++ @& — gn), 


156 THE THUE-SIEGEL-ROTH THEOREM [cuaP. 4 


and that (66) has infinitely many solutions. The numbers a; = aoé;, 
forj = 1,...,7n, are algebraic integers, and if (66) holds, then 


aa” 'y? = (aox — a1) +++ (agx — an). 


Let K = R(&,...,£,) be the splitting field of f. Any ideal in K 
dividing [apx — a,] and [aor — a;,] also divides [a; — a;], so that the 
norm of such a common divisor is a divisor of the discriminant d of f. 
Hence, if P is a prime ideal divisor of y and NP > d, then for some i, 
P?|{agx — a]. Since there are only finitely many ideals with norms 
smaller than d, and only finitely many divisors of aj”—!a, it follows 
that for each 7, 

[agzx — a] = BC’, (67) 


where B; and C; are ideals, and B; runs over a finite set of ideals. 

Let D run over a fixed system of representatives of the various ideal 
classes in K; the number of D’s is finite. Then for each 7 and some 
D, C; ~ D, so that 

[B}C; = [8]D, 
for some 8 and 6. Weshall show that 6 can be chosen from a finite set 
of integers of K. Let ((6], [5]) = HE, and put [6] = EF, [6] = EG. 
Then EFC; = EDG, whence FC; = DG; thus F\D, and F is one of a 
finite set of ideals. By Theorem 3-2, there is an H with norm less 
than c (so that H is one of a finite set) such that FH = [y]is principal. 
Thus 
[yIC; = (GH)D. 


Since C; ~ D, also [y] ~ GH; hence GH = [f,], and 
[y]C; = [¢a]D, 
where 7¥ is one of a finite set of integers. 
By (67), 

[y"llaox — a] = Big.?)D’, 
from which it follows that B,D? is principal, say B;D? = [n,]. Thus 
for some unit ¢;, 

¥s7(aoz — ai) = en;f,?. 

By Dirichlet’s theorem on units, e; can be written as e;’e,;’’”, where 
e; is one of a finite number of units. Finally, for 7 = 1,...,”, 


2 
Aotk — ay = KAZ, 


4-9] A SPECIAL EQUATION 157 


where dj, ..., A, are integers of K, and x1,..., x, are certain ones 
of finitely many numbers of K. Hence 


HA? — xgho” = ag — ay ¥ 0, 

xgh2” — xgA3” = ag — a2 ¥ 0, 

NZA3° — KAY? = ay — ag X 0. 

Now let L = K(VWq, Vx, V3). Then, in L, 
(1Vx1 — AoW x2) Ar V1 + eV x2) = ag — an, 


and since the denominators of x; and x2 can be taken to be bounded, 


it follows that 
iV — AoW x2 = Bses’, 


where £3 is one of finitely many elements of L, eg, is a unit of L, and 
1 > 11s an arbitrary positive integer. Similarly, 


doV x2 — hs WV xg = Byey', 
gyV x3 — iW x1 = Bees’. 


By () Bo (2) 
—{—) +—[(—] = -1. 68) 
Bs \es Bs \es 
If there were only finitely many distinct ratios €,/e3, there would be a 
finite set of coefficients ¢ such that 


Vapt — ag — Vaor — a3 = o(Waot — a1 — Vagt — ag) 


for every solution x of (66) and for suitable determination of the 
radicals. This is clearly impossible, so (68) must have infinitely 
many solutions in integers ¢€,/e3, €2/e; of L. But for | sufficiently 
large, this is in contradiction with Theorem 4-16. Hence the sup- 
position that (66) has infinitely many solutions is not tenable, and the 
proof is complete. 


But then 


Returning to equation (65), we see that the only possible solutions 
have z = 0 or +1, if (m,n) > 1. For the problem that remains, it 
suffices to consider the case in which m = p and n = q are distinct 
odd primes. This was treated by M. Newman, whose work was not 
published. A slightly strengthened version of his result, obtained by 


158 THE THUE-SIEGEL-ROTH THEOREM [cHAP. 4 


applying Theorem 4-16 rather than the analogous consequence of the 
Thue-Siegel theorem, follows. 


THEOREM 4-19. Jf p and gq are distinct odd primes such that 
q > 2(p — 1) and q does not divide the class number of the cyclotomic 
field Kp = R(&), where § = exp(2mi/p), then the equations 


x? —~ yt = +1 (69) 


have only finitely many solutions x, y in Z. 


Proof: We carry out the proof only for the equation xz? — y? = 1; 
the alternate case requires only trivial modifications. Put 1—¢=-7 
and [x] = P, so that P is a prime ideal of K,, by Theorem 3-6. Leth 
be the class number of K>. 

If x and y satisfy (69) with the plus sign, then 


[x — Ife — §]-- +(e — g? *] = [yl?. (70) 
Put 
Ds = [x — 67,2 — J for 0 <r<p-—l, 
O0<s<p-1, r#s. 
Then 


D, =(e2@- 0,0 —Fl=([e-1+1-7," -F) 


jr OME Gatton = _ cae 
-|2-14555,,5=£,|-|: 145 ae 


= [x — 1, 7], 


since (¢* — ¢')/(1 — £) is a unit if p}(k — 1). Thus D,, is the same 
for all r and s, and, since D,,|P and P is prime, either D,, = [1] or 
D,, = P. We consider the two cases separately. 

If D,, = [(1], then the ideals [x — §"] are pairwise relatively prime; 
since their product is a gth power, there are ideals Ag,.. 


setap-4 
such that 


[x — Vv] = A,’, r=0,...,p-—1. (71) 


Suppose that e, is the smallest positive integer such that A,*’ is princi- 
pal; by Theorem 3-4, e,|h, and by (71), e,|g. But gis prime and qth, 
so e, = 1 and A, is principal. Hence there are integers a and 8 and 


4-9] A SPECIAL EQUATION 159 


units € and e’ of K, such that z — 1 = ea? and x — ¢ = e’B?, whence 

e'B% — ea? = or. (72) 
By Theorem 2-45, the units of K, have a finite basis, so that each 
unit has a representation €; - «9%, where e; is one of the finite number 
of units obtained by taking products of powers of the basis elements, 


with exponents non-negative and smaller than g. Thus (72) implies 
that one of the finitely many equations 


€1' (€2B)? — ey (€ga)? = (73) 


must hold. But for each choice of «, and ¢,’, (73) has only finitely 
many integral solutions ega, ¢2’8 in Kz; this is evident from Theorem 
4-16 with Ky = K,,h =p—1,n=q>2(p—1),v»=0. Hence 
x, and therefore also y, has only finitely many possible values. 

The proof for the case D,, = P proceeds similarly. We put 
x —1=-7w and y = rz, where w and z are integers of K, with 
(x, z] = ({1]. Then (70) becomes 


and since the ideals on the left are pairwise relatively prime, there is 
at with 0 < ¢ < p — 1 such that 


1—¢° 
jot I. 


Thus there are ideals Ao, ..., Ap—1 such that 


prap 


yt 
Ee E| = prmeas 


As before, it follows that all the ideals A, are principal (for r = #, use 
the fact that an ideal equivalent to a principal ideal is principal). 
Since p > 2, there are distinct rational integers r and s different from 
tsuch thatO <r<p-—1,0<s<p-—1. Then for integers a and 
6 and units ¢ and ¢’ of K,, 

ea aekee w+ i = ep, 


w+ 


160 THE THUE-SIEGEL-ROTH THEOREM {cHaP. 4 
so that 


and the expression on the right is not zero. The earlier reasoning 
shows that the theorem is also true in this case. 


PROBLEMS 


1. Extend Theorem 4—18 to the case that f may have multiple zeros, but 
has at least three distinct zeros of odd orders. 

2. Deduce from the finiteness of the number of solutions of (66) that as 
the integral variable x tends to infinity, the greatest prime divisor of f(z) 
does also. [Hint: Assume that for infinitely many z, f(x) is a product of 
powers of a fixed finite set of primes, and obtain a contradiction.) 


REFERENCES 
Section 4-1 


See the following papers: J. Liouville, Journal des Mathématiques Pures 
et Appliquées (Paris) 16, 133-142 (1851); A. Thue, Journal fiir die Reine 
und Angewandte Mathematik (Berlin) 135, 284-305 (1909); C. L. Siegel, 
Mathematische Zeitschrift (Berlin) 10, 173-213 (1921); F. J. Dyson, Acta 
Mathematica (Stockholm) 79, 225-240 (1947); T. Schneider, Archiv der 
Mathematik (Karlsruhe) 1, 288-295 (1948-1949); K. F. Roth, Mathematika 
(London) 2, 1-20 (1955); Corrigendum, Mathematika 2, 168 (1955). 

The paper by Siegel contains many variants and applications of the 
Thue-Siegel theorem. 


Section 4-7 


The literature concerning Catalan’s conjecture is reviewed by R. Oblath, 
Revista Matematica Hispano-Americana (Madrid) 1, 122-140 (1941). 
Mahler’s theorem appeared in Nieuw Archief voor Wiskunde (Amsterdam) 
1, 113-122 (1953). Theorem 4-17 appeared in Journal of the London 
Mathematical Society 2, 66-68 (1926). 


CHAPTER 5 


IRRATIONALITY AND TRANSCENDENCE 


5-1 Irrational numbers. One of the oldest results in the theory 


of numbers is that V2 is irrational; this was known to the Pythago- 
reans in the fifth century p.c. The proof, when suitably generalized 
with the help of the Unique Factorization Theorem, leads to the well- 
known rule for determining the possible rational zeros of a polynomial 
with rational integral coefficients; this in turn makes it possible to 
show, if such is the case, that a given polynomial has only irrational 
zeros. Thus the numbers given implicitly as zeros of polynomials can 
be trivially classified as rational or irrational. 

If a number is given by its decimal expansion, one has only to 
determine whether its digits eventually recur periodically to know 
whether or not it is irrational. For example, the number 


0.1234567891011..., 


whose successive digits are formed in an obvious fashion, is clearly 
irrational, since arbitrarily long blocks of a single digit occur, pre- 
cluding periodicity. Similarly, using the regular continued fraction 
expansion of a real number, one can identify not only the rational 
numbers but also the quadratic irrationalities. (Unfortunately, 
there is no simple algorithm known which singles out the algebraic 
numbers of fixed degree n > 3 in a distinctive way. ) 

If a real number z is not given in one of these convenient forms, the 
problem of deciding whether or not it is rational may be decidedly 
nontrivial. It is, for example, not known whether Euler’s constant, 
defined as 


lim (1 H5tgte +2 logn), 

eS 2 3 n 

is rational. Aside from properties of special algorithms, the only 

method available for investigating such questions depends on the 

following observation. If x = a/b is rational, then for every pair of 

integers p and qg, the number gx — p is some integral multiple of 1/b, 
161 


162 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 


so that it is impossible to find an infinite sequence of pairs p, and qn 
such that 


lax — pil > |gox — pol > |gst — ps] > ---- (1) 
More generally, no such sequence can be found for which 


l@nX — Dn| ~ O for every n, and lim |qn% — pn| = 0. (2) 


Tr © 
On the other hand, when z is irrational there are infinitely many solu- 
tions of the inequality 
1 
0 < |gx — pi < Ze 
We therefore have 


THEOREM 5-1. Each of the following is a necessary and sufficient 
condition for the irrationality of a real number x: 


(a) there are integers p1, 91, D2, G2, ---., such that the inequalities (1) 
hald ; 


(b) there are integers p1, 91, D2, Y2, --- , such that the conditions (2) 
hold. 


As a simple application of this principle, we prove 
THEOREM 5-2. The number e is trrational. 


Proof: We recall the expansion 


It is well known that if ao, a, . . . is an unbounded inereasing sequence 
of positive numbers, then the series 
* (1 
= (3) 
k=0 OQ 


converges to its sum S in such a way that 


k 
Pd a 


k=0 Q& 


1 
Ta 


0< 


forn > 0. Hence if we put g, = n! and 


nm (_1\k 
Ran 


E=o =k! 


5-1] IRRATIONAL NUMBERS 163 


then p, and gn, are integers and 


(Sh) 


zZ n! _ il 
(n+1)! n+1 


It follows that 1/e, and hence e itself, is irrational. (This is a variant 
of the original proof due to Fourier.) More generally, the same 
argument shows that if the tcm of the integers ay, ..., Qn 18 0(dn41) 
as n — o, then the series (3) converges to an irrational number. 

For completeness, we give a proof due to I. Niven that 7 is irra- 
tional. It is short and simple to follow, but to one unfamiliar with 
older work it must appear completely unmotivated. 


1 Tr 
0< =n!|-— > 
€ k=0 


1 
n° Pn 


THEOREM 5-3. The number x ts irrational. 


Proof: Suppose on the contrary that « = a/b, where a and 6b are 
integers. Put 
_ 2"(a — bx)” 


n! 


F(x) 


and 
F(x) = f(a) — f(a) +f (@) — +--+ (- 1)? @), 


where the positive integer n will be specified later. Now f(0) = 


f’(0) =--- =f) (0) = 0, and if we write 
Gye” + ayz"tt + +++ + ana” 
a in 


we see that for n < k < 2n, 
1 n 
fY(2) = LO @th)mtl—1)---@+tl-k+ Lazu" t-* 
> 1=0 


i (n + 1)! n+l—k 
Moaweiahk | 


== 
— 


so that 


i 
f0) => hla. 


Hence f” (0) ¢Z, and since f(x) =f(x — 2), also f(x) € Z, for 
0<j<2n. Finally, F(O) and F(x) must be integers. 


164 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 
On the other hand, 


d 
ay (F’ (x) sin z — F(x) cosz) = F''(z) sinz + F(z) sinz 


= f(x) sin 2, 
so that 


[ f(x) sin z dx = [F’(z) sinz — F(z) cos 2], = F(x) + F(0). 


But for 0 < x < 7g, 
TN nm» N 
0<f(z) sina <=, 
n! 


so that the above integral is positive but arbitrarily small for n 
sufficiently large. But this is impossible, since F(0) + F(z) is an 
integer. The contradiction establishes the theorem. 


PROBLEM 


Given a real number z, define the sequence {2z;,} of real numbers and the 
sequence {a;} of integers by the conditions 


[7] =a, 21 = 2 — [2], 
1 1 

w1 = — + 22, where — < 21 < 5) 
ay a1 ai—]l 
1 1 

ro = —+ 23, where — < x2 < ; 
ao a2 ao — 1 
1 1 

Le = — + 241, where — < 2% < 
ax ak a. —1 


Thus 


1 1 
oe a ee aie ada 
a, 2) 


Show that this expansion terminates if and only if z is rational. Show also 
that if z has an infinite series expansion 


1 1 
oN ee oe meee 
x ee Saks 


where the numbers 6; are integers with b,41 > bz”, then b, = a, for all k, 
and z is irrational. 


5-2] THE EXISTENCE OF TRANSCENDENTAL NUMBERS 165 


5-2 The existence of transcendental numbers. One class of 
irrationals, the algebraic numbers, has been treated in some detail in 
the preceding chapters. We now consider the complementary set of 
transcendental numbers: those complex numbers which do not satisfy 
any rational algebraic equation with coefficients in Z. It is by no 
means obvious that this set is nonvacuous; the first proof, given by 
Liouville in 1844, depends on the fact (see Theorem 4-1) that if a 


is algebraic of degree n > 2, then there is a constant C such that 
the inequality 


has no solution p,qin Z. If a number é can be found such that for 
every w > 0 the inequality 


1 
SPS a q>1, (4) 


has a solution, then £ cannot be algebraic of any degree, and must 
therefore be transcendental. 


An example of a Liouville number, for which (4) always has a 
solution, is given by 


f= r= (—1)*a~*, 
k=1 


where a > 1 isa fixed integer and by, be, . . . is an increasing sequence 
of positive integers such that 


b 
lim sup 7 = 0 


k—> k 
For, given w, there is an n = n(w) for which b,4;/b, > w + 1, and if 
we put 


then p and q are integers, and 


; 1 4 1 
< lgé = p| <q: gent = qt tentilbn < gq? 


It should be emphasized that the condition (4), while sufficient 
for transcendence, is by no means necessary, even for real numbers. 


166 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 


For, using a modification of an argument due to Cantor, we can give 
a second proof of the existence of transcendental numbers, and in 
particular of numbers of this kind for which the inequality 


tee 
qd 


has only finitely many solutions for fixed e > 0. It is known* that 
there are uncountably many irrational numbers é for which M(£) = 8, 
where M (&) is the supremum of the numbers ) for which the inequality 


p 1 
<a <-> 
c : hg? 


has infinitely many solutions. Hence if the algebraic numbers are 
countable, it follows that there are nonalgebraic numbers for which 
M(&) = 3. 

To order the algebraic numbers, we associate with each non- 
constant polynomial P(x) = agpz” + ----+ a, with integral coefhi- 
cients the number h(P) = n+ |ao| +----+ ]a,|. There are no 
polynomials with h(P) = 1. Ifh(P) = 2, then P(x) = xor —z. If 
h(P) = 3, then P(z) is one of +2 + 1, +227, +7”, all combinations 
of signs being allowed. In general, it is clear that if k > 2, there 
are only finitely many polynomials such that h(P) = k. Hence all 
polynomials with integral coefficients can be arranged in a sequence: 
first those with h(P) = 2, in some order, then those with h(P) = 8, 
in some order, etc. Suppose that P;(z), Po(x), .. . is such a sequence. 
Each P;(z) has finitely many zeros; write down all the zeros of 
P(x) n some order, then all those of P2(z) in some order, etc. Let 
this sequence be §;, B2,.... Now if Be = Bi, delete Bo; if Bg = Bo 
or 6;, delete 83; and in general, if 6, is equal to some 6 with smaller 
subscript, delete 8,. Then the resulting sequence aj, ao,... con- 
tains all algebraic numbers, each just once. 

To summarize, if a number can be approximated sufficiently well 
by rational numbers, it is transcendental, but there are transcendental 
numbers which cannot be approximated even as well as some quad- 
ratic irrationalities. 


* See, for example, Volume I, Theorem 9-12, 


5-3] A CRITERION FOR TRANSCENDENCE 167 


PROBLEMS 


1. Show that — is a Liouville number if the partial quotients in its con- 
tinued fraction expansion, 


£ =a) + Bes 


me, 
ai+ 


have the property that 
: log Qr+1 
lim sup ———_—____—_~ = 
ko log ((a1 + 1)--- (+1) 
[Hint: Show from the recursion relation for the successive convergents 
that gp < (a; +1)--- (a, +1), and then use Theorem 2-6.] 


2. Investigate the implications of Theorem 4-15 as regards transcen- 
dental numbers. 


5-3 A criterion for transcendence. In order to obtain an approxi- 
mability condition which is equivalent to transcendence, we must 
replace the linear expresion gé — p occurring in the inequality (4) 
by a polynomial in &. 


THrorem 5-4. A real or complex number £ 1s transcendental if and 
only if there corresponds to each w > 0 a positive integer n, such that 
the inequality 
O < |xo + rE +--+ + atné"| < X~° (5) 
has infinitely many integral solutions Xo, ... , Xn, Where 
X = max (|zo|,... , |znl). 

It is to be noticed that the Liouville numbers (those for which (4) 
has a solution for each w) are precisely the numbers for which we 
can take n = 1 for every w. In general, however, n increases with w. 

Proof: We first prove that the condition is sufficient. Let a = a, 


be algebraic of degree g, let f(x) = a9 + ayx +--- + ax’ be that 
multiple of its defining polynomial which has relatively prime coefi- 


cients in Z, with ag > 0, and let a;,..., a, be its conjugates. Let 
h(x) = ao + ar +---+ 2,2" (t, > 0) be any polynomial with 
integral coefficients, and with zeros 61,...,8, distinct from 
Q1,..., Ay. Then 
1 Rn nr g g n 
0<|— II1f(@)|= II (6; — a;)| =| TI IT @: — a;) 
Qg ¢=1 t=1 j=1 Yeleel 
1 g 
= 7,0 iA h(a;)|> 


168 IRRATIONALITY AND TRANSCENDENCE [cuap. 5 


so that if X = max (|zo|,..., |z,|), then 


Ln? Il F(8:) le" Hh 10] 


‘= 


In! I fet Tis@0] 
0 < |h(e)| = 


ay" II \a(a;)| a" TI oo Hi (MEDD, a yea 
j=2 j 
But 
TI s@) 


is a@ symmetric polynomial with integral coefficients in the #’s, of 
degree g in each £6, and is therefore, by the Symmetric Function 
Theorem, a polynomial of total degree g, with integral coefficients, 
in the elementary symmetric functions 2,_;/%n, —%n—2/In,-.-., 
+29/2n. Hence the numerator in the expression (6) is a positive 
integer, and we have 


|h(e)| > 


Now if r = Ja], then 


hla; 
MED) <1 + Jal + laa! +--+ + lag Str tet te 
so that the quantity 
1 
my | ACa;) 
a,” 11 j\—— 
ir) Be < 


has a positive lower bound A (n, a) depending only ona andn. Thus 
A(n, a) 
Ma)| > Saar (7) 


It follows that if (5) has infinitely many solutions with fixed n, 
¢ cannot be algebraic of degree less than w + 1. Since w can be arbi- 
trarily large, cannot be algebraic. 

The necessity of the condition of Theorem 5-4 is a consequence 
of the following more general theorem. 


5-3] A CRITERION FOR TRANSCENDENCE 169 


THEOREM 5-5. Jf 31,...,%, are complex numbers, then for a 
suitable c which depends only on n and 31, .. . , On, the inequality 


c 
lzo + ry +--- + 2,0,| < XD (8) 


has infinitely many integral solutions xo, . . . , Ln- 


If ¢ is transcendental, we can take 3, = £*; since no polynomial 


in — vanishes, it follows that (5) has infinitely many solutions if 
n = [20 + 2]. 


Proof: The theorem is trivial ifn = 1. Forn > 1, put 
c! = c' (34, ...50n) = 1 + [P| ee a a lPrl, 


let h > 2 be a positive integer, and let zo’, 1;’,..., 2,’ range inde- 
pendently over the integers from —h to h inclusive. Since each of 


the n + 1 numbers z;’ can assume any of 2h + 1 values, there are 
(2h + 1)"*} = ¢ expressions 


L(81,..-, 82) = to! +20) +--+ + tad, laz/| <A. 
Let these be, in some order, L;,..., L,. Clearly 
|[L;(94, a ee ,on)| < c’h, 
so that all the points L;(01,...,0,) lie in the square of side 2c’h 
with its center at the origin of the complex plane. Subdivide this 
square into m? subsquares of side 2c’h/m each; then if m? < t, 


there must be at least one subsquare containing more than one point 
L(d;,...,8,). Wecan fulfill the condition m? < t¢ by taking 


m = [(2h + 1)?@+)] — 1. 

For this m, suppose that the points 

Iy(a, cee , on) = Lo + £10} + i In In 
and Lo(d1,..-,0n) = Fo’ + F191 +e + En On 
lie in a common subsquare; the distance between them does 
not exceed the length of the diagonal of the subsquare, which is 
2V2c'h/m. So if we put to = 2’ — Fp’, ..., In = In’ — Zn’ (so 
that X <h — (—h) = 2h), and 

LH, oe » On) = Ly (94, eee On) - D2 (341, ees » On) 
= Xo +2301 +--+ + + TnBn, . 


170 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 
then 
2V2ch 2 4c’h 
(2h + 1)2@tD] — 1 = (2ny2@tn 
2c’ 2c’ 
~ (Qh)RO-D Ss Xe) 


[L(d1,...,8n)| < 


(9) 


Hence (8) has at least one solution, with c = 2c’. 

If L (84, eee, Bn) a 0, then rL (94, eee Un) = L(xd, way 23, ) =0 
for every integer x, and (8) has infinitely many solutions. In the 
contrary case, choose h,; so large that 


Qc’ 


IL, eee 9) PF aists (2hy Dy coe 


and repeat the entire argument with h replaced by h;. Calling the 
new form thus produced L™, we have, by the analog of (9) and the 
definition of h,, that 


IL (91, ...,8n)| < |L@,..., 9x), 


so that we have a second solution of (7). Continuing the process, 
we can obtain arbitrarily many solutions. 


PROBLEM 


Show that if the numbers #j, - - - , #, are real, then Theorem 4-5 remains 
correct if the inequality (8) is replaced by 


[to + tii +--+ + tad,| << = a 


5—4 Measure of transcendence. Mahler’s classification. In light 
of Theorem 5-4, we make the following definition: a function ¢(n, t) 
is called a transcendence measure for the transcendental number ¢ if 
for each n there is a constant c, such that for every X > 1, 


|2o + XE ie oe Iné”| > Cno(n, Xx) 


for each set of integers Zo, . .. , Zn of height X = max (|Zol, .. . , al) 
By Theorem 5-5, any sich me t) is no larger than f-3@—-1)_ 

theorem giving a measure of transcendence of a number é eae 
a refinement of the assertion that £ is transcendental; such measures 


5-4] MAHLER’S CLASSIFICATION 171 


have been given for certain numbers. In Section 5—5 we shall deter- 
mine a measure of transcendence for e. 

Mahler has elaborated on the theory of transcendence measure in 
the following way. Let z be a complex number, and put 


ao 


where the minimum is extended over all those sets of rational integral 
coefficients Xo, ..., tn of heights at most X for which 


wn(X,Z) = wn(X) = min ( 


Tr 
~ 242" 
k=0 


»; yz" ~ 0. 
k=0 


Then w,(X) is at most 1, and is a nonincreasing function of both X 
and n. Put 


wn (KX) = KaPal®) (11) 
so that 
», _ log (1/wn(X)) 
pn(X ) me log X ? 
and let 


wn(z) = wn = lim sup ppn(X), 
xX— 0 


w(z) =o = lim sup — « 
n— nr 
Each of w, and w is either + o or a non-negative number. If w, is 
infinite and n’ > n, then wz, is also infinite; hence there is an index 
u(z) = uw, which may be finite or infinite, such that w, is finite for 
n <p and infinite for n > yw. The two quantities w,u are never 
finite simultaneously, for the finiteness of u implies that there is an 
n < © such that w, = ©, whence w = ©. The number 2 is called 


an A-number, if w = 0, p= oO, 
an S-number, LO<w< 0, p= om, 
a T-number, if w = 0, B= ©, 
a U-number, ifw = o, p< 0, 


If uw is finite, then there is a fixed integer n such that for every 
o > O there are integers zo, . . . , Xn such that 


lag +t eyez +--+ + 2n2"| << X7~%. 


172 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 


For the case n = 1, this is exactly the definition of the Liouville 
numbers, so that the U-numbers may be regarded as higher degree 
analogs of Liouville numbers. The author has shown that there are 
U-numbers of every degree. 

If z is algebraic, the inequality (7) shows that p,(X), and hence 
also w,, remains bounded as n > ©, so that w = 0 and z is an 
A-number. If, on the other hand, z is transcendental, it follows from 
Theorem 5-5 that p, >4(n — 1), whence » > 4%. Thus the A- 
numbers are precisely the algebraic numbers. 

The existence of 7-numbers has never been proved. 


THEOREM 5-6. If the complex numbers z and w are algebraically 
dependent, that ts, af there 1s a polynomial F(x, y) with coefficients in 
Z such that F(z, w) = 0, then they belong to the same class. 


Proof: If z is algebraic and w is algebraically dependent on z, 


then w is clearly also algebraic. We may therefore suppose that z 
and w are transcendental. 


M N 
Let F(z,y) = XD an"y", 
h=0 k=0 


and suppose that F is irreducible. (One consequence of this assump- 
tion is that no polynomial in x alone is a factor of F.) Write 


M 
F(z, y) — x Any)e’, 


N 
where Ax(y) = 2», anny”. 


We may suppose that Ay (y) is not identically zero. 

Let A(x) = a9 +--- + a,x” be a polynomial for which the 
minimum is achieved in the definition (10) of w,(X, z), so that in 
particular max (|a;|) < X. We shall obtain inequalities relating 
w(z) and w(w); since in the definition of these quantities the first 
limit is taken on X, we temporarily regard n as fixed and X as a 
parameter. 

Since it is not the case that for each fixed y the polynomials F (a, y) 
and A(z) have a common zero, we know by a standard theorem* that 

* See, for example, B. L. van der Waerden, Modern Algebra (English 


edition, translated by Fred Blum from the second revised German edition), 
New York: Frederick Ungar Publishing Co., 1949, Vol. 1, pp. 83-85. 


5-4] MAHLER’S CLASSIFICATION 173 


the resultant 


ao oe an 0 0 
0 ao o an 0 
M rows 
R(y) >= 0 0 ao eae An 
Ao(y) Auly) 0 
rows 
0 <Aoly) ... ... one ... Amy) 


is not identically zero. R(y) is a polynomial in y of degree nN at 
most, with coefficients in Z. Since F is a fixed polynomial through- 
out, the coefficients in R(y) do not exceed c,X™, where c, is a con- 
stant depending only on n and F. 

If for each 1 with 2 < 1 < M + n, the /th column in the determi- 
nant for R(y) is multiplied by xz’! and added to the first column, the 
new first column is 


A(x), A(z), ..., e¥*A(x), F(z, y), 2F(z,y), ..., 2” F(a, y). 
Expanding by minors of the new first column, we obtain an identity 


R(y) re A (x)g (a, y) + F(a, y )h(z, y); 


from which 
R(w) = A(z)g(z, w). 


Regarding g(z, y) as a sum of minors, we see that its coefficients are 
rational integers not exceeding cpX™“~" in absolute value, so that 


lg (2, w)| < cgX¥—1. 


Hence 
|A (z)| > cg 1X“ |R(w)}. 
But 
|R(w)| > oan (a.X™, w), 
sO 


[A (z)| > cg 1X! Hy, (1 X™, w). 


174 IRRATIONALITY AND TRANSCENDENCE [cuap. 5 
It follows from the definition of A(z) that 

wn (X, 2) > cg 1X ™ a, (Cy X™, w), 
and so we obtain 


log (1/an(X, z)) _ 
log X < M 1 + Monn (w), 


Wn (z) 
nN 


@n(z) = lim peUP 


w(z) = lim sup 
< lim sup —___- 


< 
4 AN < MNo(w), 


and 


u(w) < Nu(e). 
By symmetry, 


w(w) < MNow(z) and u(z) < Mu(w). 


Thus w(z) and w(w) are simultaneously finite or infinite, as are p(z) 
and u(w); hence z and w are in the same class. 


5-5 Arithmetic properties of the exponential function. In this 
section we shall prove a theorem due to Mahler which simultaneously 
shows that e is an S-number (and therefore transcendental), gives a 
transcendence measure for e, and shows that a is transcendental. 
The transcendence measure is not the most precise one known, but 
more exact results are more difficult to prove. 

We begin with an algebraic analog of Theorem 5-5. Let, ..., wm 
be distinct complex numbers (having no connection with the function 


wy (z) of the preceding section), and let r1,. . . , 7m be positive integers. 
Instead of asking for rational integers zo,...,2m for which the 
quantity 2% + 210; +---+ 2mm 1S numerically small, we shall 
investigate the polynomials 

Ax(z) = An(@371,---51%mj@1,--- 5 @m); k=1,...,™, 
of respective degrees 7; — 1,...,7%m— 1 at most, for which the 
function 


R(z) = R371, ..-,%mjp1,---, &m) 
= Aye ee Aaiz)e™ (12) 


is algebraically small, i.e., has a Maclaurin expansion beginning with 
a large power of z. The total number of coefficients among the 


5-5] THE EXPONENTIAL FUNCTION 175 


polynomials A;,(z) isr=71, +---+ 7m; if they are taken as un- 
determined constants, then the conditions 
R(0)=0, R’(0)=0,..., R*?) 0) = 0 


yield a system of r — 1 linear homogeneous equations in these 
r unknowns. Such a system always has solutions distinct from 
(0,0,...,0). Let R(z) temporarily designate any of the functions 
obtained in this manner; thus R(z), which is not identically zero, 
certainly has a zero of order r — 1 at z = 0, and could coneeivably 
have one of higher order there. Suppose that the actual order is 
r—1+#, so that R(z) has an expansion 


Rg)= DY ape, Ort+p—1 ¥ 0. 
h=r+E-1 | 
The non-negative integer E is called the excess, and m is called the 
order, of R(z). We first show that the excess is always equal to zero. 
At least one of the polynomials A;(z) does not vanish identically, 
and with no loss in generality we may suppose it to be A(z). It is 
easily proved by induction that if D = d/dz, 


D%e°7A (2) = e*?(D + w)* A(z) (13) 


for every positive integer a and every function A(z) with sufficiently 
many derivatives. Moreover, if A(z) is a polynomial which is not 
identically zero, and w ~ 0, then (D + w)*A(z) is a polynomial of 
the same degree as A(z). Hence 
De" RB, (z) 
= D'™(Ai(z)e@r om? 4... 4 An 1 (z)e@mt em)? + A, (2)) 
= A*(e)em? ft Ay a (zen tome 


= R(@;1r1,..-,Tm13 1 — Wm, +--+) Om—1 — Om), 


where A,* is not identically zero and, as implied by the notation, 
deg A,* <r, —1lfork =1,...,m—1. Clearly 


R®” (0; 74, ee ->Tm—1,@1 ar Wm) ee -»>Wm—1 ss Wm) = 0 


for p = 0,1,...,r +H —r,_, — 1, so that from an R-function of 
order m and excess E' we have obtained another of order m — 1 and 
excess £. Repeating the process, we come finally to a function 
Ry(z) = R(z;7r1;) = A(z)e** of order 1 and excess E. But if 
A(z) =% +aiz+---+4G,,.21, the conditions R,(0) = --- = 
R,“1-?) (0) = 0 give =--- = a,,_2 = 0, so that there is certainly 


176 IRRATIONALITY AND TRANSCENDENCE [cnap. 5 


no such function which does not vanish identically, if EH > 0. Hence 
R*—» (0) ¥ 0, or equivalently, the coefficient of 2”! in the Maclaurin 
expansion of F(z) is not zero, while all preceding coefficients are zero. 
Introducing an appropriate numerical factor, we can put 


=e 
(r — 1)! 


The function R and the coefficients b,, b,41,... are now uniquely 
determined, since if there were two such functions for given 


R(z) = phe eae: 


W1,..+5) Wm, 11,.--+,1m, their difference would have positive excess. 
Moreover, while we have so far known only that not all of the poly- 
nomials A;(z),...,Am(z) are identically zero, we now see that in 


fact they are of exact degrees r; — 1,...,7m— 1, respectively, 
since otherwise we could have begun with lower degree polynomials 
and arrived at a function of positive excess. Finally, we see that 
R(z) is symmetric in the pairs of arguments 71, #15... 31m, ©m; 
since the pairs can be permuted while the solution (subject to all the 
imposed conditions) is unique. This can also be seen by noting that 
R(z) is the unique solution of the homogeneous linear differential 
equation 
(D — a)" +--+ (D — wm)"™y = 0 


for which R(O) = --- = R°—?(0) = Oand R*—(0) = 1, and the 
factors in the differential operator may be permuted at will. 
We now obtain explicit expressions for R(z) and the A;(z). Clearly 


gril 
R(é3ri; = ———_— @l?, 14 
(23713 #1) (ry =e 1)! ( ) 
since this function has all the requisite properties and there is only 
one such function. Suppose that R(z;71,...,7y-13 w1,- ~~, Op—-1) 


has already been determined. Then if J is the operator 


J = i ... d2, 
0 
we have by (13) that 
(D—«1)"--- (D—ay)"#{ eS (ew R (2 571, --- Ty15 @1,---, Oy -1))} 
= (D—o)"--- (D—ey-1)"* 
XK fe? De Je wVw R (2571, 2. -) Tp-1j @1)- + - » Ou) } 


= (D—«)” aes (D— wy_1)"* 1K (2511, 22+ yTp—-1,1,---, Wy—1) =0, 


5-5] THE EXPONENTIAL FUNCTION 177 


and since R(z;r1,..., 1; @1,..-,@,-41) has a zero of order 
ryt--++r, 1 — 1 at 0, the function 


ef ue J7u(e-°w R (2; Tl +++ yTp1,@1,-.., o4-1)) 
has a Zero of order r; + ---+-7r, — 1 at 0, and it clearly has leading 
coefficient ((ry +--- +7, —1)!)—. Hence 
R(z;ry, 2+ yTy, @1,-..- » Wy) 

= eH? J7H(e Rize... »Ty-15@1,-++)@p-1)), (15) 

and consequently 
r,;—1 
R(z) _ (em? J7m) (e(em—1—om)2 Jrm—-1) een (ea) 2 J72) 9 (w1—w2)2 cee . 
(ry = 1) ! 


We now use the standard formula 


z — #\a—1 
pe) = [S956 a 


which is easily verified by integration by parts. We have 


gil 
e(e2—-e3) z yr2 (corn: ees 
(ry = 1) ! 


—1 1 
e(@x—-w3) 2 [ ° a. @ = ti) oo e(e1—we) 1 dt, 

0 (m—1)! (—D! 
_ ‘ i717} (2 ae t,)" 
0 (71 —1)! (e—-1)! 


and by induction we see that 


4 tm—1 
R(z) =— I dtm—1 f dtm—o oer 
0 0 


I — (tg—t1)"?* ++ + (tm —1 —bm—2)?™ 2! (z — tp) 
0 (73-1) '(re—1) ! - - - (tm—-1—1)'(Tm—1)! 


4 eu +w2(le —t)) +--+ +om(z on dt, ; (16) 


eeltitwe(z—ti)—w32 dt, 


Before deducing an explicit formula for A;(z), we recall certain 
properties of inverse operators. The operator D~, as applied to an 
integral combination f(z) of polynomials and exponential functions, 
yields that antiderivative which contains no constant of integration. 


178 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 
Hence 


D*f(z) = J*f(z) + eo), 


where ¢ is a function annihilated by D’, that is, a polynomial of 
degree p — 1 at most.. More generally, if w + 0 we define, by analogy 
to (13), 


(D — w)*f(z) = e**D~?(e*f(z)), 
so that 


(D — w)°f(z) = e°7J°(e“*F(z)) + ¥(2), (17) 
where y¥(z) is annihilated by (D — w)*; that is, it is e*? times a 
polynomial of degree p — 1 at most. Since 
n—1 


= n—2 
(D — w) pe ge eae = —we", 
@® @® 


wo” 


and since no term of the operand is annihilated by D — w, we can 
write 


! 2 
(D ~ wy = —- ¥ see? = 2004245 a da 


r=Q r! 


More generally, it can be shown that if F is any polynomial of degree 
n for which F(0) + 0, then (F(D))~'z” can be written as 


(a9 + ayD +--- + a,D")z”, (18) 


where dy + --- + a,u” is the Maclaurin expansion of (F(u))~! to 
nm + 1 terms.* 


We can now prove that fork = 1,...,m, 


es ll D + we — wn) ™ ae 
ne EL — re —y! a 


For m = 1, the empty product is interpreted as the identity operator, 
of course, and in this case the correctness of (19) follows from equa- 
tion (14). Suppose that it is correct for all polynomials 


*A more complete discussion of inverse operators is given in E. L. Ince, 


Ordinary Differential Equations, New York: Longmans, Green & Co., 
Inc., 1926; reprinted by Dover Publications, New York, 1944; pp. 138-140. 


5-5] THE EXPONENTIAL FUNCTION 179 
with » — 1 pairs rz, ox Then, by (15) and (17), 


R25 71, .. 2, Ppp wi, ++ On) 


n—l 
ato . (wk—W p) Zz 
See Ax (2311) - +09 Tp-1;@1)- +=) Wy—1 )e 7 
k=1 


B—1 
=e Ofer (Dayton) 


XAx(2574, woe y Mp1) ly - ++» wy—-1) +pr(Z) } 


-1 
=F (Daag ten) AAR GMs Tet j 4s +++» ea) + PEM, 


where p;(z) and P(z) are polynomials of degree 7, — 1 at most. It 
follows that (19) is correct for k = 1, for arbitrary m, and its truth 
for k = 2 follows from the previously noted symmetry of R(z) in 
the pairs w,, rz. 

For fixed complex numbers 1, ...,@m, our considerations up to 
this point are valid for all the functions R(z; 171, ..., %mj @1, - -- » Om) 
corresponding to arbitrary sets r1,...,?%m of positive integers. 
We now specialize the parameters so as to obtain a collection of 
functions depending on a single parameter p. 

For h and k in the sequence 1, 2,..., m, define 


in = 1g ii h = k, 


10 ifh¥k, 
and put 
Ri, (2) = Rp (2;p; ee Wm) =Rh(z;p+din, coey P+Omh} i) Wm), 
Ann (2) = Ang (23 Pj} 01; ---)@m) =Axn(Z; pt+d1n,..., p+Smkj W1,- +» > Om). 


Here p is a fixed but arbitrary positive integer. We form the square 
matrix 


A(z) = (Anz(z)), h,k =1,...,m, 


having determinant D(z). Let the minor determinant of Anx(z) in 
D(z) be Dax (2). 
Now Anzz(z) is a polynomial in z of degree p + 6,, — 1, and the 
coefficient of the highest power of z is, by (19), 
1 ™m 
IT (co, — wy)7P 


(o + bn, — 1)! iH 
lsh 


180 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 


Hence in the expansion of D(z), the term formed from the elements of 
the main diagonal will be of higher degree than any other term, and 
D(z) is therefore a polynomial of degree mp with the coefficient of the 
highest power of z equal to 


If, on the other hand, we solve the system of equations 
% Am(ee* = Bale), k= 1,...5m, 
for e**?, we obtain the identity 
Dee = ¥ (-1)"*Da@)Ra@. 


Since the expansion of R,(z) begins with the term 2”°/(mp)!, the 
polynomial D(z) is divisible by z”°. Hence 


rH, 


28) = Cyr s 


(we — w)*, (20) 


ste 


and D(z) vanishes only at z = 0. 

Let c1, ce, . . . be positive constants depending only on m, w,,... , wm. 
(In particular, they must not depend on p, which will eventually be 
large. ) 

Examination of equation (16) shows that for 1 < h < m, 


se ae (ez) 
From (19), we obtain 


m 00 apes 7 7 oe 
An(2) = ul a vz ") (cox — wy)? ei z 
k (o+6;,—1)! 


where the sums need not be extended past the index p. Let 
Q = Il (wy — wr), 
hk=1 
h<k 


so that Q can be regarded as a polynomial of total degree m(m — 1)/2 


5-5] THE EXPONENTIAL FUNCTION 181 


in w1,...,@m, With coefficients in Z not exceeding 2” in absolute 


value. Since no exponent p + 6,: + A; in the above sums exceeds 
2p + 1, the expression 


an, = 27°T151A52(1) 
is a polynomial in w1,..., @m, of total degree 
m(m — 1) 


5 (2p + 1) 


at most, whose coefficients are rational integers of the order of 
magnitude O(co°o!). Finally, we put 


Th = Do anre”*, h=1,...,m, 
k=1 
so that 
r, = Q2etl, 1R;,(1) — 0(-25). 
The quantities 71,...,7m are linear forms in the numbers e“*, and 
they are linearly independent, in the sense that no linear combination 
of the vectors (@n1,..., nm), for 1 <<h <™m, is the zero vector. 


This is equivalent to the assertion that D(1) ~ 0, which follows 
from (20). 


THEorEM 5-7. Suppose that w1,...,Wm all le in an algebraic 
number field K of degree g, and let 


In = 2 Opnxe”*, oa eeerere a 
=1 


be p independent linear forms in e“1, ... , e°™ with coefficients byx in Z. 
Suppose that 
m(1—2) <a <m, (21) 
g 
and put 
b= max (|brx|), s = max (|Zal). 
l<h< 1<h<p 
l<ekom 


Then to each « > O there corresponds a bo(e) such that s > b* “ af 
b > bo(e), where 


| ee 
ug — m(g — 1) 


182 IRRATIONALITY AND TRANSCENDENCE [onap. 5 


Proof: By a well-known theorem on independence,* the » forms 


Iy,..., L,, together with m — yp of the forms 7;,, which we may 
designate by rn;, .- - » Thm—py are independent. Hence the determinant 
Qhy1 Ahym 
ah —pul Steve ap —um 
A= m—s m— 
by eee Dim 
Duy eee Dum 
is not zero; it is obviously a polynomial in w,...,m of degree at 
most 


| m(m — 1) Re + 1)(m — w) 
2 
with coefficients in Z of the order of magnitude O(c4’p!"*b"). It 


follows, first, that there is a rational integer cs such that c5°A is an 
integer of K, and, second, that 


[al = O((o + 1)"cq?p !"—*b" cg”) 
= O(c7°p!" *b*), 


where cg is an upper bound for the various numbers [w,]. Hence if 
A, A”’,..., A™ are the field conjugates of A, we have 


1) ce? (cs5°A’’) - + - (c5°A) 
A N (cs°A) 
= O(cg?p 19-D HM) H-D#) | 


Moreover, using subscripts on A to indicate minors, we have 


An = O(cg’p!™*"*b*), = Array = O(c10°e **), 
forl1<l<m-—ug, 


Am = O(e11%p *b"*), AnpLi—-m+p = O(c12°p !” *b4—'s), 
form—pt1l<l<m. 
* Cf. B. L. van der Waerden, Modern Algebra (English edition, trans- 


lated by Fred Blum from the second revised German edition), New York: 
Frederick Ungar Publishing Co., 1949, Vol. 1, p. 101. 


5-5] THE EXPONENTIAL FUNCTION 183 
Using the identity 


m— 


™ 
Ae*# = Y (-1)* Aum + YS (- D akad ee 
l=1 l=m—pt+l 


it follows that 
1= O(c13°p pn (o—1)— Hapa) As O(c gPp 1-H OpHo—1g) 
or 
C 4p oH) opHo—1 g > C15 — Cig - C13°P 7 (o—-1) pope (22) 


From the inequality (21), the exponent m(g — 1) — ug is negative. 
Hence the quantity 


C13” 
p jug—m (g—1) 


may increase for small values of p, but it tends to zero as p increases 


indefinitely. At any rate, we can say that for b larger than some ¢y7, 
the smallest value of p for which 


C15 = 
-s > C16C13°p yn (g—1) Te lNmd 


is so large that ci3° is negligible as compared to the factorial, and for 
such p we have the asymptotic relation 


Lg 
log ple ug — m(g — 1) log b. 
By (22), for b > bo(e), 
s>b*., 
where 
(m — »)ug? mug 
Ta MNF mg —-1) wg —mg—1) 

This proves the theorem. 


THEOREM 5-8. Suppose that 31,...,3n are elements of an alge- 
braic number field K of degree g, and that they are linearly inde- 
pendent over the rationals, so that no relation of the form 


dy, +---+ dydy = 0 
holds with d,,..., dn rational and not all zero. Then if the coeffi- 


184 IRRATIONALITY AND TRANSCENDENCE [cuap. 5 


cients by,...»y a the linear form 


My, 


L= > stlabes = by 4---r eit: tAvon (23) 
\ =0 AW =0 : ~ 


in the quantities "1" N°" gre rational integers with 
b = max (|ba,---ryl), 


there 1s a constant T, depending only on g and N, such that for suffi- 
ciently large b, 
Spee 


Proof: Let wi,..., pw be positive integers, and consider the 
quantities 


_— pdit---+ind 
Ly... —= tin YL, 


lb =0,1,..., 41; goy ly = 0,1,..., hy, (24) 


their number being 
w= (uw +1)--- ww +1). 


If we introduce the exponential factor in (24) inside the summation 
in (23); we see that the various Lj,...14, may be regarded as linear 
forms in the quantities 

eure N, 
where 


One -Ay = Ay +--+ + Andy, 
1 = 0,1,...,Mitu; ...3; Av =O0,1,..., Ww + wy, 
the number of w’s being 
= (My +u4.+1)--- (My + un +1). 


The numbers w),..-,y are distinct on account of the independence of 
31,..., yw over the rationals, so that we can speak of the independ- 
ence of the forms Ly...1y. To see that as a matter of fact they are 
independent, order the subscript sets \1 ... Aw and 1, . . . lw by inter- 
preting the \’s and l’s as digits in the base q, for some sufficiently 
large g. Then there cannot be a linear relation among the coefficient 
vectors of any set of forms, since the )....y with largest subscript 
occurs only in the form Ly,...1y with largest subscript. Finally, there 
are positive constants a and 6 which are independent of the coefh- 


5-5] THE EXPONENTIAL FUNCTION 185 


cients by;....,,, for which 


a< < Bp. 


Ly... 
It follows from Theorem 5-7 that if 
m vo M;+u+1 g 
ee. aoa (25) 
L t=1 wy +1 g—1 
then for b > b(e), 
|Z] > b 7, 
where : 


g(My+yu1+1) --- (My+unt+1)(u,+1)-.. (uy+1) 


T= 
Condition (25) is satisfied if 
M; 
hy = 7 29. \uN ) 
(5, — ) = 
since then 
M; < (: 2g a" 
atl 2g —1 
N 
M; 29 

eo 

i=l el G24 G4 
With this choice of »; we have 

N 
M; 
HIT (i+ : _ 29 
_ i=1 bz + 1 2g — ] 
Ce Goi, Me \ se Sey 
1-£ (1 + -) eS 
+=1 Bz + 1 g 29 res 1 
ae 29u. cae Oe (26) 


Since g > 1 and N > 1, we have p; 


= M;> 1. Since [x 
for x > 1, we have It] +1 < 2x 


186 IRRATIONALITY AND TRANSCENDENCE [cnaP. 5 


and we have the theorem with 


Taking N = 1, we have 


Cornotuary 1. If 3 ~ 0 is algebraic, e° is an S-number, and in 
particular zs transcendental. 


For 3? = wi, e’ = —1 is not transcendental. Hence 


CoROLLARY 2. 7 78 transcendental. 


We also have the following result, first proved by F. Lindemann 
in 1882. 


CoroLtuary 3. If 3,...,8n are algebraic and are linearly inde- 
pendent over the rationals, then e°,...,e°% are algebraically inde- 


pendent over the field of algebraic numbers, that is, there is no poly- 
nomial P(z1,...,2n) with algebraic coefficients not all zero for which 


P(e,...,e°%) = 0. 


Finally, for N = 1 the brackets can be omitted in the definition of 
Hy; then py = (2g — 1)M, and 


7 < 2gu — 1 < 29 ((2g — 1)M, +1) —1 = 29 (2g — 1)M, + 29 — 1. 


CoroLuary 4. If & ¥ 0 ts algebraic of degree g, then the function 
o(n, t) — f—29 (29-1) n—2g +1 


is a transcendence measure for e”. 


5-6 A theorem of Schneider. In addition to the Liouville numbers 
and values of the exponential function, many other specific numbers 
are known to be transcendental. To indicate the type of results 
known, we mention the following: 

(a) The Bessel functions Jo(x) and Jo’ (x) are transcendental for 
algebraic + ¥ 0. 

(b) If a and 8 are algebraic, a ¥ 0 or 1, and 8 is irrational, then a 
is transcendental. (In particular, e*” = (—1)~ is included.) 


5-6] A THEOREM OF SCHNEIDER 187 


(c) At least one of the numbers go, 93, w1, we associated with a 
Weierstrass 9-function is transcendental, and if gp and g3 are alge- 
braic, at least one of z and ((z) is transcendental. 

(d) If f(x) is a polynomial whose value is in Z for argument in Z, 
and f(x) > 0 for x > 0, then the number 


0. FL) F(2)F(3) .-.-, 


formed by juxtaposing the decimal representations of the values f(z), 
is transcendental. (An example is the number 0.1361015... , 
generated by f(z) = (x? + x)/2.) 

(e) If w is a positive quadratic irrationality, then the number 


> [nw]2” 
n=0 


is transcendental for algebraic z ~ 0. 
On the other hand, it is not known whether the following numbers 
are transcendental: 
1 1 
(a) y = lim (145 4---+2-10en); 


n— 0 
rn) 


(b) $2n +1) = Eos, 


k=1 

(c) I'(a) for algebraic x not in Z, 

(d) e+, en. 

The methods used to prove what little is known about specific 
transcendental numbers show considerable variety, both in technique 
and conception. T. Schneider has recently shown, however, that 
several results which earlier required separate proofs can all be ob- 
tained from a single theorem. This theorem says nothing directly 
about transcendental numbers; rather, its sense is that if several 
transcendental functions assume algebraic values at a large number 
of points, then they must either have large rates of growth or be 
algebraically dependent (as functions). The prototype of Schneider’s 
result, proved by G. Polya in 1920, asserts that if f is an integral 
transcendental function which assumes values in Z forz = 0,1,2,..., 
then 


M 
lim sup (r) > 1 


— “9 
T—> 00 2” 


188 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 


where, as usual, 
M(r) = max (f(@)|). 


There have been many refinements and extensions of Polya’s work, 
of course; we mention only that by A. Gelfond in 1929, where this 
kind of theorem was first used for transcendence investigations. (His 
result was that o” is transcendental for algebraic a ~ 0, 1, if 8 is an 
imaginary quadratic irrationality. ) 

In this section we shall prove Schneider’s theorem, and in the next 
we shall apply it to the numbers a®. (The facts mentioned above 
concerning the @-function can also be deduced, but the requisite 
preliminaries preclude doing so here.) Since the statement of the 
theorem is complicated, we first introduce some notation. 

By the order of an entire function f(z) we mean, as usual, the 
quantity 

, log log M(R) - 
‘roe logR ”° 


if f(z) is of order p, then 
f(z) = oe?) 


as |z| = R — o, for every fixede > 0. Let [, f2,... be an infinite 
sequence of complex numbers. Designate by 


Zo(m) = 2, ..., elm) = % 


the distinct numbers among ¢),..., {m, and by .(m) +1=44+1 
the multiplicity of occurrence of 2, among {1,...,¢m. Thus 

k 

EG +1) =m. (27) 
Let r(m) =r be the radius of the smallest circle about the origin 
which contains 21,... , 2%, and put 

] 
a = lim inf—=— > (28) 


nove 10 


so thata < o. Let 
[ = max (Io, oe ly). 


Finally, let K be a fixed algebraic number field of degree g, and, as 


5-6] A THEOREM OF SCHNEIDER 189 


always, let [a] be the maximum of the absolute values of the con- 
jugates of a, for a in K. 


THeoreM 5-9. Let fi(z),...,fn(z) be meromorphic functions with 
the property that for each m, the numbers 


f,™ (2,), =0,...,4, *=0,...,k, v=l,...,n, 


are in K. Let H,(z,) be positive rational integers such that all the 
numbers 


H, (2x) fo (Ze), A=0,...,4, «=0,...,k, v=l,...,n, 
are integers in K. Suppose that 


< (29) 
log m 


For each v, if f,(z) ts entire let tt be of order p,, and otherwise suppose 
that there is an entire function G,(z) of order p, such that G,(z)f,(z) as 
entire and also of order p,. Suppose that 


M1 t--+ + ee 


<a, (30) 
n—l 
and put 
Wy = al ’ y= 1, »>n 
a 
Suppose finally that 
log log max (|G,(z,)|~*) 
lim sup ———°=*=*——_-. < 9, y=1,...,n, (81) 
m—> © log m 
and 


loglog max (|f. (z)|, Hy(ex)) 
O<x<k 
< », y=1,...,%n. 
(32) 
Then fi,...,fn are algebraically dependent over K. 
Proof: We form a polynomial 


ty 


tn 
Be) = Dv Le CaptnftH@) Sa"); 


71 =0 


190 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 


and seek to determine the coefficients C,,...,, so that ® has a zero of 
order l, + 1 at z,, for x = 1,...,k. Here the numbers k, z,, and l, 
are all defined in terms of the sequence ¢), fo, ... and an index m, as 
explained earlier; m is fixed, and will be specified more exactly later. 
The conditions imposed on ® require that all the numbers 


@%(2) A=O0,...,4; «=0,...,k, 


shall vanish, and this in turn yields a set of m homogeneous linear 
equations of the form 


DL OyC7,..-7, = O, po=il,...,m, (33) 
in the ¢ = (4; + 1)--- (2 +1) unknowns C,,...,,. (Of course, the 
numbers w, also depend on 71,...,7.) We put 

ty = [Qi tat tm—mr)tin) y= 1... n, (34) 
so that 


n lj/n 
t=(+1)---G@+1)2 {i iidiiaiaiiad = 2m. (35) 
p=] 


The coefficients w, in equations (33) are by assumption numbers in 
K, and after multiplication by the rational integral factor 


I (Hy (2,))” 


they become integers, say Q,, of K. The size of the coefficients Q, is 
determined in part by this numerieal factor, in part by the values of 
the f, and their derivatives at the various points z,, and in part by 
the numerical coefficients introduced by differentiation. In the esti- 
mate (36) below, the second of these is accounted for by (82). The 
third depends only on the set of exponents ¢;,...,¢, and the order 
of the derivative considered, and so can be computed from the fact 
that the sum of all the coefficients in the expansion of 


rv 
a (271 rae 27) 


dz* 
by the product formula is 


5-6] A THEOREM OF SCHNEIDER 191 
We thus obtain the bound 


Fl < TT Hye)" Tex (ome')-(S 4%) > 6) 


where «, > 0 and ¢, > 0 as m— ». (Hereafter we designate any 
quantity with the latter properties by «, and any positive integer 
independent of m, 1, and r by 7.) 

It follows from the inequality (30) and the definition of n, that 


mt---tm<n-—l, (37) 
so that, by (34), 


n l 
(= ) < (ym)'. 
y=l 
Using this, together with (32) and (36), we obtain 
[2,1 < (ym)! exp f > ging tet teenies 
y= 
< ym 


1 ymotnt -+e-tan) [n-+e 


By (29), m’< y™ By (37), m t+:---tm=n—1-—<5 withia 
positive constant; hence 


1 — 6 ) 
—(Ltmt:-++m) +e=——+e=l—~ +e 


We henceforth require m to be so large that 


7) 
e<-- (38) 
nr 
Then lo, 1 <7”. (39) 


Using this and (35), we shall now show that there are coefficients 
C,4-+-7, Satisfying (33) which are integers in K, are not all zero, and are 


such that 
| CO. 5g) <y”. (40) 


To simplify the notation, arrange the C;,,...,, in some fixed linear 
order, and rewrite (33) in the form 


t 
> 2,C, = 0, = Vom 
T=1 


Let pi, ..., pg be an integral basis for K, let hc Z be positive, and 


192 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 
let B be the set of integers in K of the form 
bap: +++ + Bypo 
where the b’s range independently over the rational integers such that 
lb] < h. For each set of elements X;,..., Xz 0f B, put 
t= E WX, p= 1onec mM, 


This defines (2h + 1)gt m-tuples y1, ..., Ym, not necessarily different 
from one another. Also, since 


[X, < yh, (41) 

we have from (39) that 
y,l < yh. (42) 
Each number y, has a basis representation cip1 + --- + Cgpg; 


similar representations, with the same c; and with the p; replaced by 
their conjugates, hold for the conjugates of y, The determinant 
formed from the p; and their conjugates is not zero, so that it is 
possible to solve the g equations defining y, and its conjugates for the 
numbers c;, giving each c; as a linear expression in the conjugates of 
Yy,, With coefficients depending only on K. From (42) it follows that 
forz = 1,...,4g, 

les] < yh. (43) 


There are, however, exactly (2y"h + 1)® different integers of K 
whose basis representation satisfies (48); therefore there are at most 
(2y"h + 1)” different systems y1,..., Ym. If 


(Qyh + 1) < (2h + 1)”, (44) 
then two systems 41,...,Ym corresponding to two different sets 
X1,..., Xz coincide, and the respective differences X,; — X1’,..., 


X, — X," constitute a solution of (33). These differences, which we 


eall Cy,...,C:, are not all zero, and by (41), they satisfy the 
condition 


IC, | < yh. 
By (85), ¢ > 2m, so (44) holds if 
(2y"h + 1)™ < (2h + 1)", 
which is clearly true if h = y”. But then (40) holds. 


5-6] A THEOREM OF SCHNEIDER 193 


We now designate by mp a fixed value of m such that (88) holds, 
and by ko and 1, the corresponding values of k and I,, and define ® 
to be a fixed function corresponding to mp and having all the proper- 
ties described up to this point. We are now able to perform an 
induction. 

We know that © possesses m, zeros, if each is counted with its 
proper multiplicity. It is asserted that if mo zs sufficiently large, then 
&(z) vanishes at all the points {1, f2,.... This is proved inductively 
by showing that if (2) = 0 for z = &,..., m, with m > mo, then 
also ®(Sm4i1) = 0, if mo is sufficiently large. More precisely, we 
assume that ® has a zero at 2,(x = 0,...,) of order l, + 1, with 


k 
»~ (l. + 1) =m >= Mo, 
x =0 


and shall deduce that @ has a zero at [n41 = § of order A + 1, where 


ee 0 iff = 211 ¥ 2, forx = 0,...,k, 
I+ 1 if¢=2z,and0<o<k. 
Here k = k(m) and l, = I, (m). 
Put 


G(2) = HI @,t*(2); 


then G(z)@(z) is an entire function which vanishes at the same points 
2, aS &(z), and to the same order, by (31). We also put 


Qe) = i Ca e)e 


2y FE 
By Cauchy’s theorem, 


d* (G(z)®(z)/Q(z)) 
dz" 


Here [I is the circle 


= x G(z)®(z) dz 


it aids Q@) @-pm 


|z| = Ry = R’, o> 1, 


where R = r(m-+1) if a < © (we recall that r(m) was defined 
earlier as the radius of the smallest circle about O containing 
Zo, ++» ,2k), While if a = ©, R is so chosen that 


l 1 
R>r(m +1), lim EN = ©, lim R = ©. 


194 IRRATIONALITY AND TRANSCENDENCE 


[cuaP. 5 
Since ® has a zero of order A at ¢, the left side of (45) is simply 

G(z) 

rr ae ; 

Q@? “|, 

so that 
Q(5) A! [ See dz 
(A) St ee a Eee Nee 


We shall use this representation to estimate [6 (¢)|. By the in- 
equality (40), we have 


max |G(z)®(z)| < éy™ exp (= t, Ry" *) ) 
=1 


jz] = Ry 


where e— 0 as Ri — ~, or equivalently as m— ~. By the defini- 
tion (28) of a, 
Ri = R?° < ml (a-@) 
or 
R, < melee 


even fora = ©. Hence 


max |G(z)®(z)| < 7” exp (= tani) ; 
\z| =x =1 


since it is easily seen from the definition (34) that ¢ < 70, 


From 
(34) we also have that 


nr nr 
> tyme wte ue Y_ (Quem htart tn) 2a) 9 Parte 
=] y=l1 


n 
< > Dlr yy GAtayt -setnn) |/n—nyt¥nyt+e 


y=l1 
< > Qm1—Ela)+O—-D arte, 
y=1 
We may suppose, with no loss in generality, that each », < 1. 
For suppose that 7,, say, is larger than 1. Then in analogy with (380), 


zk PL cn 
n—2 


and all hypotheses of the theorem are satisfied by the n — 1 functions 
filz),...,fn1(z). But if fi(z),-.-,fn-1(z) can be shown to be 


5-6] A THEOREM OF SCHNEIDER 195 


algebraically dependent, then f; (z), ... , fr(z) must also be dependent. 
Consequently, if we put 3} = 1 + 6/2n, we have 


é 
(ead max i) Sas 
sea ( 2n 


If mo, and hence also m > mg, is so large that 


7) 
rage 47 
e<p? (47) 
then + ment O-Darte < nm, 
v=] 
and max |G(z)®(z)| < ye?” < y™. (48) 
jz] = Ry 


Continuing the estimation of the right side of (46), we notice that 
since 3 > 1, and since FR grows indefinitely with m, it is possible to 
choose mp so large that 


R 
ca lz — 2,| > = ; (49) 
forx = 0,...,k, and also 
; R 
min lz —¢| > . (50) 
; Ry\"" 
In that case, ae Q(z) (2 — part > (3) (51) 
Since ¢ and all the z, lie in the disk |z| < R, we have 
lé — 2Z,| < 2k, 
so that ]Q(5)| < (2R)”. (52) 
Finally, we see from (31) that 
I@@)| > exp (— & wmrete) > (53) 


Combining the relations (48), (51), (52), and (53), we have 


R —m—1 
BO < aR ata (EB) Rk 


2 
R m 
< yas | — 
2 (=) 


< y"AAR-O-Dm 


196 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 


Since by hypothesis 
m+1 
O0<x<k we log (m + 1) 
the inequality 
A“ < y” 
holds; hence 
jo (¢)| < aig omen, (54) 


Recalling that &(z) is a polynomial in f,(z),...,fn(z) with coefii- 
cients C, which are integers in K, and that all the derivatives of 


fi(z),...,fn(2) up to order A have values in K for z= §, we see 
that also é“ (¢) is a number in K, and that the product 
a® (5) TH?) 


is an integer in K. By the same reasoning as was used in producing 
the estimate (36), we have 


TH") OG < ym. Th w'()-( 5 4) 


Hi exp (t,m7t*) . Il (t, + 1). 


The factor 7” comes from the estimate (40) for IC,| , while the last 
product is the total number of terms in ®(z) itself, which was an 
unnecessary factor in (36), where we were estimating the terms in a 
derivative arising from a single term in ®(z). By arguments used 
previously, it follows from the last inequality that 


I #,'"(6) aM Ol<. 
Combining this with (54), we have 


N (a (f) TH H,"())| <n es (55) 


and the upper bound here is smaller than 1 for m sufficiently large, 
say m > m,. Hence if mg is so large that mp > m, and the inequali- 
ties (47), (49), and (50) hold, then it follows from (55) that 

BM (5) =0, 
as asserted. 


5-6] A THEOREM OF SCHNEIDER 197 


To complete the proof of Theorem 5—9, we shall make use of the 
following general considerations. Let y(z) be analytic in the disk 
bounded by a circle I, and let z;,...,2, be interior points of this 
disk. Then ¢(z) has an expansion 


(2) =ao+a; (¢—21)+02(2—21) (@—22)+ --- 
ap a(@—21) +++ (@—tpa)+(@—a1) +++ (@—4p)Rp(2) (66) 


with constants do, .. . , @p—1 and a function Rp (z) regular in the disk. 
In fact, if we put 


t 
Par (t — 21) -- Te Se me, eae on 
forg = 0,...,p — 1, and 
t 
5 Per ee pee dt = R,(2), (58) 
then 
y(z) — (ap + ay (2 — 41) +--+ + ap ale — 21) --+ @ — 2p-4)) 
apa (= ee ee Sr 
QriJr\t-—2 t—a (¢ — 21)(t — 22) 


(2 —%)-°: @— “2=1)) 
ee ) ot) dt 
(¢ —21)--- @ — Zp) 
aie (2 — 41)-+- (2 — Xp) 
Qri Jp (t — 2)(§ — 41) -- + (¢ — Zp) 
= (¢—2):-:@- Lp)Rp(Z), 
and it is clear from (58) that R,(z) is regular inside I. 
We apply this with 9(z) = ®(z)G(z), the “interpolation points” 
Z1,.-..,2%p being [),...,m, in this order. For I we choose the 
circle |z} = R, = R° with 3 > 1, where R > r(m), lim R = o, and 


y(t)dé 


log m _ 
Tim inf OR R = a. 


Since @(z) vanishes at all the points [1, f2,..., the integrand in the 
expression (57) for a, is regular in I’, so that 


a, = 90 forg = 0,...,m-—1. 


198 IRRATIONALITY AND TRANSCENDENCE [cHap. 5 


Hence for fixed z with z ¥ &, fo, ..., 


G(@)() = lim (Qn(@) “5 [ OPO —* ), 


where 
k 


k 
Qm(e)=We-a)4, Etiam 


x =0 


As in the derivation of (54), we have 


IG @)®@)| Sy" Mt (t, + 1) exp (= bey) (2r)™ (4) 
as y=1 


<7 () yh oe 
fy 

Since this inequality holds for arbitrarily large m, and since R in- 

creases indefinitely with m while + does not, it must be that 


G(z)&(z) = 0. 


Hence @(z) vanishes for all z, which is the assertion of the theorem. 


5-7 The Hilbert-Gelfond-Schneider theorem. As an application 
of Schneider’s theorem of the preceding section, we now prove 


THeoreEM 5-10. Jf a and b are algebraic numbers, b ts trrational, 
and a is neither 0 nor 1, then a® is transcendental. 


This theorem settles a question raised by Euler concerning the 
arithmetic nature of the logarithm of a rational number to a rational 
base, and repeated in more general form in the seventh of Hilbert’s 
famous list of 23 outstanding problems which seemed to him to be 
both difficult and important. The list appeared in 1900, but it was 
not until 1929 that Gelfond made the first contribution to the solution 
of this problem. Further partial results were obtained by Kusmin, 
Siegel, and Boehle, and in 1934 complete proofs were given almost 
simultaneously by Gelfond and Schneider. As mentioned earlier, the 
proof to be given now is most nearly in the spirit of Gelfond’s 1929 
paper; it should be instructive to the reader also to examine the 
original complete proofs by Gelfond and Schneider. 

We apply Theorem 5-9 with n = 2,f,(z) = a’, fe(z) = 2, and 
tm = u+ vb, where u and v range over the positive integers. On 


5-7 THE HILBERT-GELFOND-SCHNEIDER THEOREM 199 


account of the irrationality of b, the numbers ¢,, are distinct; they 
are to be ordered by the size of w+, and otherwise arbitrarily. 
Suppose that in the sequence {),..., {m, all the numbers occur for 
which u + v < d, and possibly some (but not all) of those for which 
u+v=d-+41. Then clearly 


d(d — 1) d(d + 1) 
2 27s e 
while 
r = max (ju + vb|) < (d+ 1)(1 4+ [d)), 
and (taking u = d — 1,v = +1), 
r>d—1-— |b. 


These inequalities show that yr? < m < yer’, for some positive 71 
and yz. Thus 
. logm 
a= lim = 

mw log T 
By the choice of f)(z) and fo(z), wy = 1 and we = 0, and the 
inequality (30) holds. Since a’ and z are entire, (31) is without force. 
If we suppose that z and a’ are elements of an algebraic number field 
K for z = b, then fo(¢) = u + vb and f;(¢) = a“(a®)” are also in K 
for positive integral u and v. (We need not examine the derivatives, 
since £1, f2,... are distinct.) Moreover, if ¢ is a positive rational 
integer such that ca, cb, and ca’ are integers of K, then we can choose 


Hi(z.) =c"™” and  Ao(z.) =c 


for z, = u-+ vb. It follows from this and the definitions of f;(z) and 
fe(z) that the inequality (32) holds. Thus, under the assumption 
that a,b, and a? are all algebraic, all hypotheses of Theorem 5-9 are 
satisfied, and it follows that z and a’ are algebraically dependent. 
This being palpably false, the above assumption cannot be main- 
tained, and the theorem is proved. 


PROBLEM 


Show that e® is transcendental for algebraic 3 ~ 0. [Hint: Choose 
fi(z) = e*, fe(z) = 2, and 


S~@—1)241 =---= 6,2 = ne, 
forn = 1, 2,....] 


200 IRRATIONALITY AND TRANSCENDENCE [cHapP. 5 


REFERENCES 
Section 5-1 
I. Niven, Bulletin of the American Mathematical Society 53, 509 (1947). 


Section 5-2 


J. Liouville, Comptes Rendus Hebdomadaires des Séances de I’ Académie 
des Sciences (Paris) 18, 883-885, 910-911 (1844); Journal des Mathe- 
matiques Pures et Appliquées (Paris) 16, 133-142 (1851). 


Sections 5-3, 5-4, 5-5 


Most of this material is adapted from Mahler, Journal fir die Reine und 
Angewandte Mathematik (Berlin) 66, 117-150 (1932). For the existence of 
U-numbers of each degree, see LeVeque, Journal of the London Mathe- 
matical Society 28, 220-229 (1953). 


Section 5-6 


Siegel’s work on Bessel functions is to be found in Abhandlung der Kgl. 
Preussischen Akademie der Wissenschaften (Berlin), article no. 1, 70 pp. 
(1929). The first result stated concerning the -function is due to Siegel, 
Journal fir die Reine und Angewandte Mathematik 167, 62-69 (1932); the 
second to Schneider, zbid., 172, 70-74 (1934). The transcendence of deci- 
mals formed from polynomial values was proved by Mahler, Proceedings 
Konink. Nederlandsche Akademie van Wetenschappen (Amsterdam) 40, 
421-428 (1937), and that of the series >> [nw]z” by Mahler, Mathematische 
Annalen (Leipzig) 101, 342-366 (1929) and 103, 532 (1930), and Mathe- 
matische Zeitschrift (Berlin) 32, 545-585 (1930). 

Schneider’s Theorem 5-9 appeared in Mathematische Annalen 121, 
131-140 (1949); he includes a bibliography of work on integral-valued 
functions. Pdlya’s work appeared in Nachrichten von der Gesellschaft der 
Wissenschaften zu Gottingen, pp. 1-10 (1920), and Gelfond’s in Téhoku 
Mathematical Journal (Sendai, Japan) 30, 280-285 (1929). 


Section 5-7 


Hilbert’s problems appeared in Nachrichten von der Gesellschaft der 
Wissenschaften zu Gottingen, pp. 253-297 (1900). The complete solution 
of the seventh was given by Gelfond, Comptes Rendus de l’ Académie des 
Sciences de l’ U.R.S.S. (Moscow) 2, 1-3 (in Russian), 4-6 (in French) 
(1934), and Bulletin de ? Académie des Sciences de  U.R.S.S. (Leningrad) 
7, 623-640 (1934); and by Schneider, Journal fiir die Reine und Ange- 
wandte Mathematik 172, 65-69 (1934). There is an excellent exposition of 


Gelfond’s method by E. Hille, American Mathematical Monthly 49, 654-661 
(1942). 


CHAPTER 6 
DIRICHLET’S THEOREM 


In this chapter and the next we shall consider various questions 
concerning the distribution of the rational primes. This is a large 
and difficult field, and we shall be able to obtain only a few of the 
important results. The first of them, to which this chapter is de- 
voted, is Dirichlet’s famous theorem that there are infinitely many 
primes of the form km + 1, where k and I are fixed integers which are 
relatively prime. 


6-1 Introduction. Although proofs of certain special cases of 
Dirichlet’s theorem are given in elementary texts,* the methods used 
cannot be generalized to prove the full theorem. To get an idea of 
the method used by Dirichlet, let us consider the question of the 
infinitude of the set of primes of the form 4k + 1. We base the dis- 
cussion on the Riemann ¢-function, defined for s > 1 by the equation 


f(s) = 
n=17 


This is perhaps the simplest of all the Dirichlet series 


which play an important role in prime number theory. One reason 
for their importance is exhibited in the following theorem, which 
gives a relation between the set of primes and the set of positive 
integers. 


THEOREM 6-1. Fors > 1, 
1 =] 
¢(s) =m(1- 5) (1) 
Pp Pp 
Proof: In less abbreviated form, the assertion is that 


1\? a | 
lim H(a-= =lm > -- 
N-o pin Pp N- oo n=1 
* See, for example, Volume I, pp. 9, 46, 59. 
201 


202 DIRICHLET’S THEOREM [cHap. 6 


The relation 


er a ee 


i n=0 
holds for |x| < 1; since |p~*| < 1, we have 
IT ao = Op ep ee). 
PS 


PSN 


Multiplying out the product on the right, we obtain terms of the 
form n *, where n runs over the integers composed exclusively of 
primes not exceeding N. Moreover, each such n occurs exactly once, 
by the Unique Factorization Theorem. The multiplication is per- 
missible, since the series involved are absolutely convergent, and 
the terms can be arranged in any order. Thus 
IL GQ - pty? = on, 
DSN 
where the accent indicates a summation, in the natural order, over 
all n such that pin implies p < N. In particular, the sum contains 
all terms n~* for which n < N. Hence 
N 
Id —-p ey =D ntt Dis, 
PSN n=1 n>N 
and 


t) 0 1 
=> n= > nm < | 4° de = ————? 
n>N n=N+1 N (s — 1)N* 


sinces>1. Thus 


L’ n* = o(1) 
n>N 
as N — o, and 


lim [I i-p yt = on en = {(s). 


N-©o p<N 0 n= 


To see exactly how {¢(s) behaves as s— 1T, we use the following 
standard result.* 


Lemma. Suppose that i, \2,... 78 a nondecreasing sequence tend- 
ing to infinity, that c1, co,... 1s an arbitrary sequence of real or 
complex numbers, and that f(x) has a continuous deriwative for 


L > Ay. Put 
C(z) = Dd eq. 
Mu<z 


* See, for example, Volume I, Theorem 6-15. 


6-1] INTRODUCTION 203 
Then for x > vi, 


= enf (un) = C(x)f(z) — [ ; C(t)f’ (t) di. 


Applying this with An = 1, Cn = 1, f(x) = x *, we obtain 


DX oe $s a 


n<xt 


forz >1. If we put (x) =2—- is we have for s > 1, 
1 ét— (t) x — (x) 
2 s f° stl dt t+ x 


nsx 
* di t 
ofS — sf ate ae 


1 & x* 
s s * (t) l (x) 


= ———— — ______ _g dt _-. 
s—1 (s — 1)2s—} : pst rs ys} x 


Letting x increase without bound and noting that 0 < (xz) < 1, 
we have 


t(s) = a = | = dt. (2) 


This expression for [(s) agrees with the earlier definition for s > 1, 
but it is also meaningful for 0 < s < 1, since the integral converges 
for alls > 0. It may therefore be thought of as defining ¢(s) for 
s>0,s #1. Atany rate, (2) shows that 


lim. ¢(s)(s — 1) = 1, (3) 
and a fortiorz that 
_ C(s) = o. (4) 


For the remainder of this section, let g and r designate primes of 
the forms 4K + 1 and 4k — 1 respectively. Define the function 
x(n) by the equations 

x(1) = I, x(q) oe 1, x(r) = = 4; x (2) 7 0, 
x(mn) = x(m)x(n) for every pair of integers m, n. 
(A function which satisfies the last of these conditions is said to be 


completely multiplicative; it is entirely determined when its values for 
all prime arguments are known, since x(p*) = (x(p))*.) Inasmuch 


204 DIRICHLET’S THEOREM [cHap. 6 
as n = 1 (mod 4) if and only if 2(n and the total number of r’s 
dividing n is even, we have 
fire ( if 2|n, 
x (—1)2@-)) if Qin. 
We now investigate the function 


L(s)= ¥ sad 


n=1 1 


If we write >'a, <> b, to mean that |a,| < b, for n = 1,2,..., 
then 


for s > 1, so that the series for L(s) is absolutely convergent for 
s > 1. More than this is true, however: the series for L(s) converges 
fors>0. For we note that for any n > 0, 


x(n) +x(n+1)+x(n+2)+x(n+ 3) = 0, 


so that we have 


N 

2, x(n) 
4 8 4{iN} N 

= DTxnm+Uxmt+---+ YL xM+ DT xn) 
n=1 n=5 n =4[4N] —3 n =4{4N) +1 


N 
=0+0+---+0+ ZE x(n), 
n=4(4N] +1 


and hence 
N 
> x(n) 


The truth of the assertion is therefore a weak consequence of the 
following theorem, which is due to Abel. 


<1. 


THEOREM 6-2. If {a,} ts a sequence of constants for which 
N 
Le an = O(1) 
n=l 


as N — «, and if {b,(s)} 7s a sequence of positive-valued functions 
which converges monotonically and uniformly to zero for s in some 


6-1] INTRODUCTION 205 


anterval J, then the series 


DD anbdp(s) 
n=1 
converges uniformly for sin J. 
Proof: Put 
A, = > Qk, 
k=1 


so that |A,| < A for some A and all n. Using the monotonicity of 
b,(s), we have 


k k 
Le Andy (s) 
n=j 


2: (An — An—1)bn(s) 


n=) 


k-1 
X An(bn(s) aa On41 (s)) + A xbz(s) at A;_1);(s) 


< A(b;(s) — by(s)) + Abz(s) + Ab;(s) = 2Ab;(s). 
By hypothesis, this upper bound can be made uniformly small, for 
sin J, by taking j sufficiently large. This proves the theorem. 


Here we have a situation which does not arise in the case of power 
series. For while a power series converges absolutely at every interior 
point of its interval of convergence, the Dirichlet series for L(s) 
converges for s > 0, but converges absolutely only for s > 1, since 
the series 

= 1 
n=0 2n + 1 


x(n) 
n 


2 
n=1 


diverges. 
On account of the complete multiplicativity of x, we have 


(1-22) * = 14 2, GO"... 
P Pp 


P 
2 
Sg OO) eA 
P P 


Using this idea, the proof of Theorem 6-1 can easily be modified to 
yield 


THEOREM 6-3. If f 1s completely multiplicative, and the series 


- f(n 


$ 


n=1 7 


206 DIRICHLET’S THEOREM [cHap. 6 


converges absolutely for s > so, then 
-) n —! 
2 Ha = [I (: — 1) 
n=1 7 ?p Pp 
for s > 8. 


Corotuary. Fors > 1, 
1 
L(s) = m(1 — xP) 
p P 


We are finally in a position to prove Dirichlet’s theorem for primes 
of the form 4k + 1. Let s be greater than 1. We have 


(s)=WTa-psyt=a-23y7'Ta-@)yoela-ry, 


and, from the corollary to Theorem 6-3, 
Lis) =a -—@)7 Wa+ 7%). 
Hence ; ‘ 
c(s)\L(s)=-2°)*Na-g@)yrlda-r*y*. 6) 
Now, for s > 1, ; 


and so 
lim {(s)L(s) = ~, 
s—1* 


by (4). If there were only finitely many primes gq, the expression 
on the right side of (6) would remain bounded as s— 1", since 
for s > ], 


Ma-r*y<Ma-ry<Wd-p*y = 7). 


This contradiction shows not only that there are infinitely many 
primes g, but also that they occur sufficiently frequently that 

lim IT (1 — g%)7! = o&. 

salt g¢ 

The proof which has just been given contains most of the essential 

features of the general proof. The major formal difference which will 
arise in the general case is that we shall have to consider a number 
of functions like x above, and each will have an associated Dirichlet 


6-2] CHARACTERS 207 


series, some aspects of whose behavior must be investigated. The 
most difficult part of the proof lies in showing that these series do 
not vanish at s = 1, a point which caused no trouble in the present 
case. 


PROBLEM 


Let 6, = 1 or 0, according as the equation n = x? + y? has or does not 
have a solution in integers x, y. It is known* that b, = 1 if and only if 


every prime r = —1 (mod 4) which divides n occurs to an even power in 
the canonical factorization of n. Show that the series 
=~ On 
n=1 n° 


converges for s > 1, and diverges for s <1. [Hint: Establish a relation 
among ¢(s), L(s), and the square of the given series.] 


6-2 Characters. We recall that the elements of a reduced residue 
system (mod k) form an abelian group under multiplication (mod k), 
which we designate by M(k). The number of elements of M(k), 
called its order, is ¢(k); hereafter we shall use h as an abbreviation for 
g(k). 

One of the fundamental theorems on finite abelian groups is that 
every such group has a basis: if it is a multiplicative group, this 


means that there is a set of elements Aj,...,A, such that every 

element of the group can be written uniquely in the form 
Ay71-+-A,™, 

where each x; is one of the integers 0, 1,..., ord A; —1, and ord A;is 


the order of the cyclic subgroup generated by A;. Moreover, the 
product of all the numbers ord A; is the order of the group. The 
following theorem, for which we give a proof based on the theory of 
primitive rootst is a special case. 


THEOREM 6-4. (a) Let k = py%1--- p,*", where p; ¥ pj; and each 
of the prime powers p;“ has a primitive root, say g;. Then the numbers 


* See, for example, Volume I, Theorem 7-3. 
t See, for example, Volume I, Chapter 4. 


208 DIRICHLET’S THEOREM [cHaP. 6 
A1,..-.,A, form a basis for M(k) af, for each 2, 


A, a {9 (mod p,**) 
*=l1 (mod pj) fj #i, 1<5 <r. 


(b) Let k = 2%po%2-- + p,", where a > 3, and let g; be a primitive 
root of pi** for2<i<r. Then the numbers Ao, Ai,..., Ar con- 
stitute a basis for M(k), where 
i a (mod 2%) 

° [1 (mod pf) = for2<i<r, 
Pe 5 (mod 2%) 
' |1 (mod p,“*) for2<t<r, 
and for2<i<r, 


An 9: (mod p;**) 
* 1 (mod 7,;*) forj#tz 1l<j<r. 


Proof: Let abe relatively prime tok. Then it is also prime to every 
divisor of k, so that there are unique elements a;,...,a, of 
M(p1%1), ..., M(p,*"), respectively, such that 

@ = a, (mod p,*1), 
(6) 


a =a, (mod p,;**). 
Conversely, for any choice of a;,...,a, in M(p,%1),..., M(p,-*), 
respectively, the system (6) has a solution a which is unique modulo k, 
by the Chinese Remainder Theorem, and a is prime to k. Moreover, 
if a is the solution of (6), and if, for 1 < 7 < 1, b; is the solution of the 
system 

—_ ts (mod p;**) (7) 

* ~~ [1 (mod p,*) forj~i, l<j<r, 

then 


b,---b, =1---1l-a;-1---1 =a; (mod p,“), for 1 <i <r, 
so that 
a = b,---b, (mod k). (8) 


(Thus, in the language of group theory, M(k) is the direct product 
of M (pi), ..., M(p,*").) 


6-2] CHARACTERS 209 
Now if p,“* has a primitive root g;, and 


A; = 19: (mod p,**) 
: 1 (mod p;**) forjx#i, l<j<r 


—_— a_— *) 


then, since 

a; = gine? ‘(mod p;**), 
we have that 

b; =A,™*% (mod p;), forl <j <r, 
and hence 

b; = A™9% (mod k). 
Thus by the congruence (8), if all p;“‘ have primitive roots, 

@ = Aye... 4 nies (mod k), 

and this representation is unique if each index is given its smallest 
non-negative value, so that 0 < ind a; < 9(p;“). 


On the other hand, if p,;%1 = 2% with a > 3, then —1 and 5 con- 


stitute a basis for M(p,“1). For* 5 is a primitive \-root of 2%, so that 
the 2°? numbers 


5, 57,..., 527 
are distinct (mod 2%); since they are all congruent to 1 (mod 4), 
and since there are exactly 2°? numbers in a reduced residue sys- 
tem (mod 2%) which are congruent to 1 (mod 4), these must be the 
numbers. Likewise, their negatives are all the numbers congruent 


to —1 (mod 4) in a reduced residue system (mod 2“).f Hence, if a 
is in M (2%), then, for some choice of zp and 2}, 


a = (—1)705*1 (mod 2°). 
Thus if Ao, ..., A, are defined as in part (b) of the theorem, we have 
@ = AoA 71A,4%2... 4,9% (mod k), 
and the representation is again unique if we require that 
0 < % < ord Ag = 2, 
0< a < ord Ay = 2%? 
0 < inda; < ord A; = o(p;“). 


* See Volume I, Theorem 4-9. 
{ A similar argument is used in Volume I, in the proof of Theorem 5-1. 


210 DIRICHLET’S THEOREM [cHaPp. 6 
Notice that in the two cases we have 
ord A, --- ord A, = 9(pi%1) --- ¢(p,°") = h, 
ord Ag: ord A; --- ord A, = 2- 2%? - o(po%2) --- p(p,%7) = h. 


To obviate the distinction between cases (a) and (b), we rename the 
basis elements B;,..., Bm, and put ord B; = h; fort = 1,...,m. 

A complex-valued function x, defined.over the group M(k) (more 
generally, over any finite abelian group), is called a character (mod k) 
(or a character of the group) if it is completely multiplicative and not 
identically zero, that is, if 


x(ab) = x(a)x(b), for a and b in M(k), 
x(a) € 0, for some a in M(k). 


Since in the group M(k) we identify integers which are congruent 
(mod k), we have 


x(a) = x(a’), ifa=a’ (modk) and (a,k) = (a’,k) = 1, 


so that one could also think of characters as being defined over the 
residue classes themselves. Notice that necessarily x(1) = 1, since 
for any a for which x(a) ~ 0, we have x(a) = x(a-1) = x(a@)x(1). 
Moreover, if a is in M(k) and ord a = ¢, then 


(x(a))* = x(a’) = x(1) = 1. 


Since ¢|h, it follows that every value of every character is an hth root 
of unity. 

On account of its complete multiplicativity, any character is totally 
determined when its value is specified for each basis element B;. 
Thus the characters are contained in the set of all completely mul- 
tiplicative functions over M(k) for which 


x(B;) = eI 0 < Bj < hy, (9) 


forj = 1,...,m. But conversely, every such function is obviously 
a character, and different choices of the 6’s lead to different characters. 
Thus there are h different characters, corresponding to the h, -- - hy, 
different m-tuples (61, ..., Bm). 

Two groups G and G’, with elements a, b,... and a’, b’,..., are 
said to be isomorphic if it is possible to find a pairing of elements of G 
with elements of G’, such that each element of G corresponds to 


6-2] CHARACTERS 211 


precisely one element of G’, and conversely, and such that if a <a’ 
and b <> b’, then ab<>a’b’. In this case the groups are abstractly 
identical, and any theorem concerning one group has an immediate 
analog for the other group. To construct such an isomorphism 
between two finite abelian groups, it suffices to find a one-to-one 
correspondence of basis elements such that corresponding elements 
have equal orders. For let the bases be Ci,...,C;,andCy’,..., Cs’, 
so named that ord C; = ord C;’, for i = 1,...,s8. Then we can 
make a and a’ correspond if 


/ / 
a=(C,71---C, and a’ =(C,'"1---C,°7, 


0< x; < ord C;. 
For if also 


b — C7141 eee C48 and b’ = C,/"1 i at oe C,/%, 
then 
ab = C71 ane C,78tYs, a’b’ =: C,/*1t% Sede C,,/tstus, 


and 
(ab)’ dei C,/t1t%1 ee C,/tstys = g'b’. 


Moreover, this is a one-to-one correspondence, since the representa- 
tions by basis elements are unique for the ranges 0 < x; < ord C; = 
ord C;’, 1 <t<s. 
For the basis By,..., Bm of M(k), define characters x1,..., xm 
as follows: 
e2Til hy 


om ifj =p, 

xu(B;) = iF ifjxp, 1<j<m. a0) 
Then from the sentence containing equation (9), we see that every 
character can be represented uniquely in the form 


X = x1P1--- xno, = OOS Bs < hy fori =1,...,m, 


since this gives x(B;) as in (9). (We say that two characters are 
equal if they have the same value for every element of the group, and 
define the product of two characters as the function whose values 
are the products of the component values; this function is also a 
character, by the sentence following (9).) Under multiplication, the 
characters form a group X(k), having basis x1,...,xmj since 
ord x; = ord B;, the groups X(k) and M(k) are isomorphic. The 


212 DIRICHLET’S THEOREM [cHapP. 6 


unit element of X(k) is the character x), the principal character, 
such that x)(@) = 1 for every a in M(k). 
We summarize the chief results obtained so far. 


THEorEM 6-5. There are h distinct characters (mod k), and these 
form a group X(k) which is isomorphic to M(k). Every value x(a) 
is an hth root of unity. The characters x1, ..., Xm defined in (10) 
form a basis for X (k). 


We shall also need the following result. 
TuroreM 6-6. If x is in X(k), then 


= h if X = X0) 
wee x(a) ( if x ~ xo, 


while if ais in M(k), then 


_ jh ifa=1(modhk), 
iw x(a) = f if a #1 (mod k). 


Proof: We have 


> xola)= DY 1=h. 
oe M(k) a€ M(k) 


If x ¥ xo, then for some @ in M(k), x(@) # 1. For this G, 
x (@) L x(a) = LX x(a)x(@) = rz x(aa@), 


and, as a runs over a reduced residue system, so does ad, so that 


x (@) 2 x(a) = 2 x(a), 
Lu x(a) = 0. 


If a #1 (mod k), and a = B,1--- B,?", then some 2; ~ 0. For 
this 2, x;(a) ¥ 1, and 


xi (a) x x(a) = ye xi(a)x(a@) = DO x(a), 


where x; = x:x. As x runs over X(k), so also does xix = x,’, and 


x: (a) 2. x(a) = > x(a), 
2. x(a) = 0. 


6-2] CHARACTERS 213 


x(a) has so far been defined only for arguments relatively prime 

to k. For simplicity in later formulas, we define 
x(a) = 0, if (a, k) > 1. 
This does not affect the validity of Theorem 6-6. 

The duality of the relations of Theorem 6-6 is a reflection of the 
isomorphism of X(k) and M(k). In a sense, the reason for the 
importance of characters in the investigation of primes in progres- 
sions lies in the second relation, since it singles out the elements of 
a particular residue class (mod k), so that by use of the relation 


1 
So) =F ¥E ga) E xa), 
u<a<r usa<o x 

a=1 (mod k) 


sums can be extended over an entire interval instead of a finite or 
infinite arithmetic progression ké + 1. Moreover, by a slight modi- 
fication, any other residue class can be distinguished in the same way. 
THEOREM 6-7. If (a, k) = (b, k) = 1, then 
x(a) | hs if a = b (mod k), 
xEX(k) x (b) 0 otherwise. 
Proof: Choose c so that be = 1(modk). Then 
x(a) 
x@) _ 
xexk) X(D) xe XH 
and, by Theorem 6-6, the last sum is h or 0 according as ac is or is 
not congruent to 1 (mod k), that is, according as a is or is not con- 
gruent to b (mod k). 
It should be noticed that the function 
_ f(—1)?*-) for n odd, 
x(n) = ( for n even, 


introduced in Section 6-1 is a character modulo 4. It and the prin- 
cipal character 


x (ac), 


1 for n odd, 
xo(n) = ( for n even, 
constitute the group X (4) of order y(4) = 2. The correspondence 
Xo <> 1, X 3, ; 
describes the isomorphism between X (4) and M (4); each is the cyclic 
group of two elements. 


214 DIRICHLET’S THEOREM [cHaP. 6 


6-3 The L-functions. For each character x, we define a function 
L(s, x) for s > 1 by the equation 


C+) 


Ce an eee 


n=1 7 
or equivalently (according to Theorem 6-3) by the equation 
x(p)\~ 
L(s,x) = II{1—- 7 ae (11) 
Pp 


In particular, 
L(s,xo.) = Ld -p*)y* =Wd-2p so), 
ptk p\k 


so that, by equation (2), 
s ” (t) ) 

L = 1—p*)(—— - —— dt }- 12 
(x0) =a) (5-sf (12) 
This latter representation for L(s, x9) is consistent with the series 
definition for s > 1, and may be taken as the definition for0 < s < 1. 
For the proof of Dirichlet’s theorem, it is necessary to know some 
of the properties of these Z-functions. All the relevant properties 
can be proved by elementary arguments, but the proofs frequently 
can be simplified considerably if use is made of the theory of func- 


tions of a complex variable. In these cases alternative proofs will 
be given. 


THEOREM 6-8. L/(s, xo) zs continuous for s > 1, and 
h 
lim (s — 1)L(s, xo) = F- 

sit k 

Proof: For s > so > 1, 

1 
L(s, Xo) a > = 

(n,k) =1 7% 


so that the series for L(s, xo) converges uniformly in any interval to 
the right of s = 1. Since the separate terms are continuous, the sum 
is also continuous. Moreover, by (12), 


ne | 
<4 Ss ree (so), 
n=l no 


k 
in iLonvealieag es Oe. 
s—1t p\k k k 
For x ¥ xo, Theorem 6-6 shows that for arbitrary no, 


no+k 
2 x(n) i 0, 
nur=™7 


6-4] NONELEMENTARY PROOF 215 


so that by grouping the terms of 
- x(n) 
in blocks of k, with perhaps part of a block left over, we see that 


It follows from Theorem 6-2 that the Dirichlet series for L(s, x) is 


convergent for s > 0. We need a slightly stronger result, which is 
proved in the following theorem. 


<A. 


THroreM 6-9. If x ¥ xo, then L(s, x) has a continuous derivative 
(and is therefore itself continuous) for s > 0. 


Elementary proof: We use the standard theorem from analysis, 
that if the series resulting from termwise differentiation of a given 
series converges uniformly over an interval, then its sum is the 
derivative of the original series. The termwise derivative of 


— (13) 


and for 0 < s < s < 8;, the result follows from Theorem 6-2 by 
taking a, = x(n), ba(s) =n *logn. But so may be arbitrarily 
small, and s; arbitrarily large, so that every s > 0 can be included in 
an interval in which L(s, x) is continuously differentiable. 


Alternative proof: Applying Theorem 6-2 and the fact that 
t] f- +] 1 

yes Sy 
n=1 1 


n=1 1 


where o = Res, we see that the series for L(s, x) is uniformly con- 
vergent for Res > 0,9 > 0. Since each term of the series is an 


analytic function of s, the sum is also analytic, and is therefore 
differentiable. 


6—4 Nonelementary proof of Dirichlet’s theorem. There is a proof 
of Dirichlet’s theorem which is remarkably simple and illuminating, 
and which fails to be elementary only in the sense that logarithms of 


216 DIRICHLET’S THEOREM [cHap. 6 


complex numbers are used. If the student who is not familiar with 
this extension will assume that the usual properties of logarithms of 
positive numbers (including the form of the Maclaurin expansion of 
log (1 + x) for |z| < 1) carry over to logarithms of nonzero complex 
numbers, he will find this proof much more straightforward than the 
elementary proof given in the following section, where use is made of 
the relation 


f’ f (x) 
= log f(z) = 
f(z) 
to avoid logarithms entirely. 
For s > 1, |x(p)/p*| <1, so that for such s we can describe a 
branch of the function log (1 — x(p)/p°) by the equation 


@)) aa (x22 = x(p™) 
jog (1 - 22) = 5 2 (x = , 
Bp) mV pe oy mp 
By (11), this induces the choice 


log L(s, x) = > 2S uF fors > 1. (14) 


p m=1 


THEOREM 6-10. For each x, the function 


F(e,x) = log L(e, x) — 522) (15) 


Dp 


is bounded in absolute value for s 2 1. 


Proof: We rewrite (14) in the form 


log L(s, x) = ian 2 a r xe) 


Here, 


p’(l — p*) 
1 _ad-2%y? 
ps (1 mes ae 9 2 ms 
dae S 2 S277 


* 2 n=1 ns 7 2 


¢ (2s), 


and since {(2s) is bounded for 2s > 1 + e, the theorem follows. 


6-4] NONELEMENTARY PROOF 217 
We can now complete the proof of Dirichlet’s theorem, except for 
one gap which will be considered later. 
THEOREM 6-11 (Dirichlet’s theorem). If (k,l) = 1, then there are 
infinitely many primes of the form kt + Ll. 
Proof: Multiply equation (15) by 1/x(J) and sum overall x in 
X(k). This gives 


log L(s, x) _ x (Pp) F(s, x) 
ra ee 
_yl yx) F(s, x) 
ae ar ae a) 
and, by Theorem 6-7, 
log L(s, x) _ i F(s, x) | 
2 x (L) ane ee p° uy 2 x (L) oe 


Let s > 1* in (16). The second term on the right remains bounded, 
by Theorem 6-10. We know that 


lim L(s, xo) = ©, 
s—1t 


] 
so that lim log L(s, xo) = © 
s—1* X0 (l) 
Suppose for the moment that it had been shown that the remaining 
functions L(s, x) (which we know to be continuous at s = 1) have 
nonzero values L(1, x) ats = 1. It would follow that 
] 
s—lt xxxg x (l) 
and (16) would then imply that 
1 
lim D>)  75==®, 
s—1+ p=l(modk) DP 
an equation which is possible only if the sum has infinitely many 
terms. Thus when we show that L(1, x) #0 if x ¥ xo, we shall 
have proved not only Dirichlet’s theorem but the stronger result that 
the series 
1 


p =l (mod k) Pp 
diverges. 


218 DIRICHLET’S THEOREM [cHap. 6 


6-5 Elementary proof of Dirichlet’s theorem. It is possible to 
avoid the complex logarithm log L(s, x) by using its derivative 
instead: 

L'(s,x) _ L’ 
L(s, x) o. L (s, x). 


d 
— log L(s, x) = 
dx 


If we could use the relation (14), we could immediately deduce that 
L’(s, S ™\ lo 
( 5 x) EP. 


L(s, x) 7 p m=1 p , 

since we cannot, we arrive at the same result by the rather more 
awkward method of dividing L’(s, x) by L(s, x). In the process, 
we shall have occasion to use some properties of the Mébius yp-func- 
tion, which is defined by the following relations: 


1 ifn = 1, 
p(n) = 40 if n is divisible by a square larger than 1, 
(—1)’ if n is the product of r distinct primes. 


Alternatively, » is the multiplicative function (that is, u(mn) = 
u(m)u(n) whenever (m,n) = 1) such-that 

1 fin=l1, 

p(n) = 4-1 ifn = p, a prime, 

0 fn=p*, a>. 

The properties we shall need are these. * 
1 ifn = 1, 

O) ge) ifn > 1. 


(b) If f is any number-theoretic function and 
F(n) = pe f(d), 
then 


fin) = ¥ ular (<) , 


THEOREM 6-12. If f is a completely multiplicative function, and the 
series 


ea) 


n=1 n> 


* See, for example, Volume I, Theorems 6-5 and 6-6. 


6-5] ELEMENTARY PROOF OF DIRICHLET’S THEOREM 
converges absolutely for s > So, then 

= f(n)\\* — 2 f(m)u(n) 

(242) - game 


n=l né n=1 n® 


for s > So. 
Proof: We have 
= fim) & fln)u(n) _— S Smn)u(n) 


m=1 m* n=1 nr m,n =1 (mn)* 
. py u(d) 
=> #— si) = 
I= 
THEOREM 6-13. For each x, the relation 
n)A(n 
eax) = — 5 MORO) 


holds for s > 1, where 


ee i: p ifn = p% for some a > 0 and prime p, 


otherwise. 


219 


(17) 


Proof: By the preceding theorem and the expression (13) for 


L’(s, x), we have, for s > 1, 


/ 
- (s, x) 


_> x(m) log m >> x(9)# (9) 


=1 
__ f x(ri)u(a) log m 
m,j =1 (mj) 


- x(0) E a(@) log 5 
7 2 


m> 


n> 


But from the obvious relation 
logn = >> A(d) 
d\n 
and the Mobius inversion formula quoted above, we have 


A(n) = ¥ n(@) log =» 
d\n 


and the theorem follows. 


220 DIRICHLET’S THEOREM [cHaP. 6 
THEOREM 6-14. For each x, the function 


G(s,x) =F (18) 


is bounded in absolute value for s = 1. 


Proof: Equation (17) may be rewritten in the form 


: lo = . D 
Ei see cs gp _y 5 x2 p™) log 
L Pp p m=2 ps 
and 
° x(p”) log p * log p log p 
RK = = 
ea ohm UP) 
log p 
<K = 
x p*(1 — 27°) 


and the last series clearly converges for s > 3. 

We can now complete the proof of Dirichlet’s theorem in much 
the same way as before. Multiplying both sides of (18) by 1/x(J), 
and summing over all x in X (k), we obtain 


1.’ ae x(p) log P wit: 
Pe Se a ee mee 
=—h SBP + 66,2). 

pel(modk) — oe 


Now let s—> 1+. The second term on the ae remains bounded. 
Assuming again that L(1, x) # 0 for x ¥ xo, the quantity 1/L/(s, x) 
is also bounded for s sufficiently close to 1, since Z is continuous at 
s=1. For x # xo, L’(s, x) remains bounded, by Theorem 6-9. On 
the other hand, 


/ 
L (s, Xo) ae 


(n,ky=1 1° pik P 
= log L(s, xo) + F(s, xo), 


by Theorem 6-10, and the quantity log L(s, xo) + F(s, xo) increases 
without bound as s > 1+. It follows that 


: lo 
lim »> = = 0, 
s—1+ pawl(modk) P 


6-6] PROOF THAT L(1, x) ¥ 0 221 
and the theorem is again proved, except for the verification of the 
fact that L(1, x) ¥ O for x ¥ xp. 

6-6 Proof that L(1, x) ~ 0 


THEOREM 6-15. If x assumes a nonreal value for some n, then 
L(, x) #0. 


Proof: Let x be such a character, and let x be the function whose 
value for each a is the complex conjugate of that of x. Clearly xX is 
also a character, and x ~ x. But if L(1, x) = 0, then also 

Ll, X) = L(, x) = 0, 


so at least two L-functions must vanish in this case. Since L(s, x) is 
differentiable at s = 1, the quantities 


L(s, 
L'(1, x) = tim LMsx) and, x) = Him LE? 
s-—l so1 S$ — 
exist, so that there j is a number A such that 
il L(s, x) 
. X ¥Xo0 
um, (s — 1)? A 
Since 
h 
lim (s _— 1)L(s, Xo) = k ’ 
s—1+* 
we deduce that 
II Ls, x) 
lim II L(s, x) = lim \(s — 1)((s — 1)L(s, x0)) =" - 
st x s—1t (s — 1) 
h 
—-(0--.4=0. 
0 i A 
But by (14), 
= x(p™) 
Slog Lis,x) =X D SS 
Xx PpPm=1 mp 
0 2 x(p™) 
= LX mu * ms 
pm=1 mp 
=h > —~ > 0 
m mp 


D 
p™ = 1 (mod k) 


222 DIRICHLET’S THEOREM [cHaP. 6 


for s > 1, so that 
lim JJ L(s, x) > e® = 1. 
sl* x 


This contradiction establishes the theorem. 


It would not be easy to avoid the use of the complex logarithm in 
this proof, since the Dirichlet seriesfor [],L(s, x) has very complicated 
coefficients. To obtain an elementary proof, it is simpler to use a 
different combination of Z-functions. Unfortunately, the choice we 
make can hardly be motivated by an elementary argument, but must 
remain a deus ex machina until Section 7-3. It is the left side of the 
inequality 

L*(s, xo)|L(s, x) *1L(s, x7)? = 1; (19) 


this inequality we now show to be valid for s > 1. 
Note first that for z2 = r (cos 6 + 7 sin 6), 


|1 — z|? = |1 — reos6 — zrsin@|? = 1 — 2rcosé + 7°, 
and that for arbitrary real 6, 


2 cos 6 + cos 26 = 2 cos 6 + 2 cos” @ — 1 = 2(cos6 + 4)? —3 > —3. 


Using the fact that the geometric mean of three positive numbers is 
at most equal to their arithmetic mean, we see that, if pk and 


x(p) = cos 6, + 7 sin 45, 


then 
2\ 2 2 2 
(|: -2@ ; 1X @) 
P p 
= (1 — 2p *cos6, + p **)*(1 — 2p * cos 20, + p **) 
< ¢ = 2p * (2 COS 6, + cos 205) +. p 7%) 
1 3 
<Q+p +p") < ( =) 
L=p 

or 


(1 — xo(p)p*)*]1 — x(p)p|4]1 — x7 (p)p*| < 1. 
This inequality also holds if p|k, and, multiplying over all p, we obtain 
(19). 
It is now simple to prove that L(1, x) ¥ Oif x? ¥ Xo, that is, if x is 
nonreal. Supposing the opposite, and using the fact that L’(s, x) is 


6-6] PROOF THAT L(1, x) ¥ 0 223 


continuous at s = 1, we have that for 1 <s < 31, 


|L(s, x)| a |L(s, x) L(A, x)| a | L’ (u, x) du| < Ai(s as 1), 
where A, = max |L’(s, x)|- 
1<s<s, 
But now (19) can be recast in the form 
(s — 1)((s — 1)L(s, xo))* aoa x) IL(s, x’)? = 1, 


in which the first factor tends to zero and the others remain bounded, 
ass—>1'. This inequality is false for some s > 1, and the contradic- 
tion shows that L(1, x) ¥ 0. 

No device of this sort has been found for the case that x(n) is real 
for all n. Showing that L(1, x) ~ 0 for a real character is the most 
dificult point in the entire proof. Dirichlet effected it by showing 
that L(1, x) is a factor in the class number of a certain quadratic 
field. This and other algebraic proofs require a considerable amount 
of background; we shall content ourselves with an elementary and a 
function-theoretic proof. We first sketch the idea. 

If s > 1, then 


Led- FEM. 5 Ask 


so that if we put 


fin) = E x@), (20) 
then (L(x) = 5 (21) 
n=l 


fors > 1. 
By Theorem 6-17, below, 
fm 2 1 

eat gee mn): 
so that even if the series }-f(n)n~* converges to the right of s = 4, 
it is certainly not bounded near s = 3. In the analytic proof, we show 
that (21) is correct for s > % if L(1, x) = 0, and obtain the con- 
tradiction 


= (2s), 


lim L(s, x)f(s) = L@, x) () = 


224 DIRICHLET’S THEOREM [cHap. 6 


In the elementary proof, questions of convergence are avoided by 
considering partial sums for s = 4 rather than the full series for s 
near 3. It will be shown that 
f = 
ae 


n=1 


= 2Vx L(1, x) + O(1), 


and also that the sum on the left tends to infinity with z, so that the 
relation L(1, x) = 0 is impossible. 


THEOREM 6-16. With f as in (20), 


0 forall n, 
F(n) 2 for square n. 


Proof: Being the arithmetic sum function of a multiplicative func- 
tion, f is itself multiplicative,* so that 


F(pi% + +» pp?) = f(py%) - - - f(p,*). 


Since x is a real character, x(p) = 0 or +1 for each prime p, and 


(0) = E xP") 
8=0 


: 1+0+---+0 if x(p) = 0, 
=> &@))P = jl +it---+1 if x(p) = 1, 
pe Leet at pe 
Hence 
1 cane 
a _ jJa+l if x(p) = 1 
Fp") = 1 if x(p) = —1, a even, 
0 if x(p) = —1, @ odd, 


and the theorem follows. 


THEOREM 6-17. The relations 


x x(n) = O(1) (22) 
and 

~x(m) _/1 

=a o(3) @3) 


hold asx — ~, fors > 0. 
* Cf. Volume I, Theorem 6-3. 


6-6] PRooF THAT L(1, x) #0 225 
Proof: We have already noticed that if 


S(2) = & x(n), 


then |S(xz)| < h, which implies (22). Using this, we have 


= x(n)} | & Sir) — Sr — 1) 
ra al er 
ae 1 1 _ S(z — 1) 


=. ft 1 h 2h 
< h ——me ee —_—_—_——_-_ 
<0¥ (G- aa) tere 
which implies (23). 


THEOREM 6-18. There ts a constant C such that 


2 1 
a9 20 h0(— =): 
2 a za+C+ (+) 
Proof: Put 


1 7 d 1 
tp = 2Vn — 2Vn — -+-| cal ) 


nN 


so that 


a = 27/27 —-2- ae 


Now én, being the area of the triangular region bounded by the curve 
= —4 = — pemes —4 e e e 

y =x *and the lnesz = n — land y = n ?, is positive and smaller 

than (n — 1)? - n?, so that the series 


DX tn 
n=l 


converges, and 
a = 1 1 1 
2 = 2 eS) 
joss keen 1 We V2 


IVa — Lrit Ean ¥ w=14E4+0(4), 


n=2 n=z+1 


This proves the theorem for integral x; its extension to real x is 
immediate. 


226 DIRICHLET’S THEOREM 


[cHAP. 6 
THEOREM 6-19. If x # xo is 
a real character, then L(1, x) ¥ 0. 
Proof: Put 
= f(n) 
G(x) = 2 ay 
By ooreny 6-16, 


vz 4 


eae 
pe = em 


so that G(x) — © with z. 
On the other hand, 


¥ 


FIGURE 6-1 


G(x) = 


oat xv) 
—-ux@) = L —: 
j=1 Vj di vse Vuv 
This sum, extended over the lattice points u,v for which u > 1, 
v > 1, w < 2, we split into two parts, as indicated in Fig. 6-1: 

VE x/u Vz x/o 

x (v) v 
aey-y y May F xo) 
u=lo=vetl Vuy v= unt Vw 


_ Vr 2 z/u x(v) x (v) z/o 1 
ee 2 nly: +EO ae Ry 


= 2 004 + += xO (2 Ji+e+0()) 


Vz 
= O(2-*) - O(a*) + 2 2V2y x +0. O(1) + Fe OW 


= 2Vz (xa, x) + o(— 5) + O(1), 


= fh) 
2 pa = 2Vz L(1, x) + O(1). 


so that 


(24) 


Thus, if L(1, x) were zero, G(x) would remain bounded as z > © 
which is not the case. 


A rather more straightforward (function-theoretic) proof can be 
obtained by extending (21), which we know to be valid for s > 1, to 


6-6] PROOF THAT L(1, x) + 0 227 


the range s > 4, under the assumption that L(l,x) =0. By an 
argument quite similar to, but slightly simpler than that which yielded 
(24), it can be shown that if L(1, x) = 0, then 


X fin) = O(v2). 


Theorem 6-2 implies that the series 
=f) 2 fai som 


converges for s > 3. Now let ao be a real number greater than 2; 
and let s be a complex number with Res =o > oo. Then for 
vy >u> 1, we have 
~ f(m) — 2 f(n)/n% 
ig 
neue 1h Se te 
* f(m) "<3 f(m) 


m=1 m0 m=1 m°% 


o—-1l in fm) ( 1 7 1 ) 
n=um=1 m0 \n*~%0 = (n+1)*"% 


] 
M 
M 


“=1 f(m) 
4a (soy) 5 Le i tye ee 2 7 ’ 
=1 
so that 
2 n) 
x LY f(n) 
n=%& n° 
o—1 1 1 (09) | 4 Alun o-"| 
<A — — ———_—_ A 
~ 2) ns = (n+1)* % oe 
aA n+l ) —(o—40) 
ov! oo) + Av 
=ADdI( (@ — 00) | son|+4e o— 
v—l oo) , 
S42) a Ayo) 4 Au 
aman 


where A is such that 


X $0) 


<A, foralN 2? 
n=1 n%0 


228 DIRICHLET’S THEOREM [cHap. 6 


It follows that the series 
f(n) 


ns 


[+ a] 
> 
n=1 


converges to an analytic function in the half-plane ¢ > 3. Since it 
coincides with L(s, x)¢(s) for s real and larger than 1, it represents 
an analytic continuation of L(s, x)¢(s) foro > 3. Butit is unbounded 
near s = 4, while L(s, x)f(s) is not. 


CHAPTER 7 


THE PRIME NUMBER THEOREM 


7-1 Introduction. It is shown in elementary number theory 
texts* that if 


mm Te) 
z>o t/log x 

exists, it must have the value 1, and that there are positive constants 
c and c’ such that for x > 2, 

aa) < ce’. 

x/log x 
Neither of these results implies the other, of course; together they 
show that 

a(x) a(x) 


0 < lim inf —— <1 <li 
= ie z/logx7- ~ puted z/log x 


(Here, as always, r(x) denotes the number of primes less than or 
equal to x.) Both results were obtained by Chebyshev in 1851 and 
1852 (in rather more precise form), but it was not until some forty- 
five years later that the final link was supplied by Hadamard and 
de la Vallée Poussin, who showed independently that the limit 
actually exists, and thus proved the Prime Number Theorem. Both 
proofs made essential use of the theory of functions of a complex 
variable, and despite much effort it seemed for many years impossible 
to give a proof entirely free of considerations as sophisticated as 
this theory. In 1948, however, P. Erdés and A. Selberg gave a com- 
pletely elementary proof. More precisely, Selberg proved the funda- 
mental relation 


~ log? p + YX log plog gq = 2x log x + O(z), 
pSz PASz 
and he and Erdés deduced the Prime Number Theorem from it.f 


* See, for example, Volume I, Sections 6-6 and 6-7. 

T Excellent expositions of this proof are given in T. Nagell, Introduction 
to Number Theory (New York: John Wiley & Sons, 1951) and in G. H. 
Hardy and E. M. Wright, An Introduction to the Theory of Numbers (8rd 
edition, New York: Oxford University Press, 1954). 


229 


230 THE PRIME NUMBER THEOREM [cHap. 7 


We present a proof based on the behavior of the ¢-function for 
complex s. Throughout this chapter, familiarity with the contents of 
a standard course in the theory of functions of a complex variable 
is presupposed. 


Before going into detail, we outline the proof. Our object is to get 
an estimate for 


m(z)= D1l= dL Pn), 
P<Sx n=1- 
where P is the characteristic function of the primes: 


_ jl if nis prime, 
aS otherwise. 


While P itself does not arise in a natural way, the function P* such 
that 


1 
P*(n) = . if n = p” for some m, p, 
0 otherwise, 
occurs in the Dirichlet series for log ¢(s): 
ad * 
log ¢(s) = Py ae = x 2) (1) 
For fixed m, the number of mth powers of primes which do not exceed 
z is equal to the number of primes which do not exceed ~/z, so that 


x x 1 vz 1 v2 
X Pr(n) = ¥ Pn) +5 LE Pm) +z L Pr) t+: 
— 4 (V2) a 


n=l 
3 ree, 


= r(x) + 
and since, for m > 2, 
a(cilm) < cgil™ < eVz = o( =. = 
log x 
it is to be expected that 
ES P*(n) ~ x(2). 


n=1 


In light of (1), the present case is a specialization of the following 
problem: given a function 


An 
f= 5%, 


n=l 


7-1] INTRODUCTION 231 


to estimate 


ay an. (2) 
It will be shown that 
ale 2+ of evs w if w a 0, 
a= ori 2— 04 ga f ifw <0, 


so that J(w) is closely related to the characteristic function of the 
positive real numbers. If we put 


x 
WwW = log —) 
n 
this gives 
1 [?t** (2/n)° Ae . (x/n) ifn<z, 
Qari 2Z— w% s? 7 0 if n = x, 


so that 
2+ 04 x 
af sS(s) ds = z an log — 
Qari 


If 5 = 5(x) tends monotonically to zero as s > ©, then 


> an pean) a — > a log - 


nS z(1 +8) nr naz 


=log (1 +5) Dan,+ DD anlog 
nix 


x<in<az+iz 


x(1 + 38) 
n 


= log (1+ 5) > am + O(log (1 +8) Ds an) 
nox x<n<x+éz 
If the remainder term here is of smaller order of magnitude than the 
first term for suitable choice of 5, then 


+ 04 x $ 
> o~s [ POE AD 50) ae 


nix 2726 2— 07 


and the problem reduces to that of obtaining an adequate estimate 
of this integral. To do this, we replace the line of integration by a 
suitable large closed contour, inside and on which we have sufficient 
information about f(s) to apply standard contour-integral techniques. 
In the case at hand, the estimation of the integral in the last rela- 
tion requires some knowledge of the zeros, poles, and size of ¢(s). 


232 THE PRIME NUMBER THEOREM [cHap. 7 


7-2 Preliminary results. Following the odd but harmless tradi- 
tion in analytic number theory, we designate by o and ¢ the real and 
Imaginary parts of the complex variable s. For x > 0, z* means 
e°'&? where log x indicates the real logarithm. 

When we have proved the Prime Number Theorem, we shall con- 
sider some other rather similar problems, and for one of these it will 
be necessary to use not the Riemann ¢-function but the so-called 
Hurwitz ¢-function, defined for 0 < w <1ando > 1 by the equation 

(6, w) = 5 ——— 
n=o (n+ w)® 
Since {(s, 1) = [(s), and since the requisite properties are no more 
difficult to prove for ¢(s, w) than for ¢(s), we consider the more 
general function. 


THEOREM 7-1. For any oo > 1, the series 


z (n + wy 


converges uniformly for « > ao, so that ¢(s, w) ts regular (or analytic) 
foro>1. 


Proof: We have 
I(n sf wy | = oe ie log a) = e7 7 lee (n-+w) __ = (n + wy’, 


so that for o > oo, 
my (n+w)r< L, (n + w)~%, 


Thus we have a series of analytic functions which is dominated 
throughout the region « > oo by a convergent series of positive con- 
stants, and which is therefore uniformly convergent, and the result 
follows from Weierstrass’ theorem. 


TuHrorem 7-2. If a and b are integers with b > a = 0, and if f 
has a continuous derivative over a < x < b, then 


b b b 
> f(n) al f(u) du +f (u — [u])f’ (uw) du. 
n=a+l a a 


7-2] PRELIMINARY RESULTS 233 
Proof: We have 


[geo du = vgn) — mv ie-y =| Hau 


25 Gs) | Ric — | © Hla) du 


= s(n) +f" tals’ au seu) au 


from which the result follows by summing on n from a + 1 to b. 


THEOREM 7-3. If m is a non-negative integer, and o > 1, then 
1 _ s 1 ; -  u—[ul 
(s—1)(m+w)** nao (nt+w)> Im (utw) 


Tt follows that £(s, w) — 1/(s — 1) ts regular for « > 0, and that 
(3) holds for o > 0. 


Proof: Ii o > 1 and 


1 
= wy 


then the equation of Theorem 7-2 continues to hold if b — ©, and, 
replacing a by m, we have 
= 1 1 - u— [ul 
un @tuy @-Dmtw *), ww) 
from which (3) follows. Since 


du, 


u— [ul 1 2 2S 
(u + w)etl (u+ w)rtt yor) : 


the integral on the right side of (3) converges absolutely for o > 0, 
and uniformly for ¢ > 69 > 0. For arbitrary n = 0, the quantity 


[~ u — [ul] i “u-n 
a 2 ou 

n (u+vw) n (ut+w) 
is a regular function of s for « > 0; the same is true of 

0 n+l 

u — [ul u — [ul 
——_—, du = —__—— 
ea 2 wtwy wt wyee 


t=) 


234 THE PRIME NUMBER THEOREM [cHoap. 7 


form>0. Finally, taking m = 0 in (8), we have 


1 1 wh*-—1 = u— [ul 
GS ae ag - sf (ut wert 


and the right side is regular for o > 0. 


Equation (3) thus provides an analytic continuation of ¢(s, w) 
over the half-plane o > 0. The function is actually analytic over the 
entire plane, except for the pole at s = 1, but this fact is not needed. 

Hereafter c will denote a positive constant which depends only on 
the arguments indicated; it need not have the same value in different 
occurrences, unless it has a subscript. 


TuroreM 7-4. For 3 <o <2andit> c(w), 


It(s, w)| < #4. 
For t > 8 and 1 — (logt)' <o <2, 
I¢(s, w)| < e(w) log t. 


Proof: For $ <o <2 and ¢>8 we have |s| <2+¢ < 2 and 
ls —1| >z¢> 1. Hence if we take m = [¢] + 1 in (3), we have 


1 1 tty "du 
Kewl <apipes tet let oe 


1 
<Gapes tt Eat 


or 


“oY 
I¢(s, w)| < ya + c(w) + u s +. 4¢1-¢. (4) 


({{] + 1+ wh 
Thus, for this same range of o and é, 
1 


IZ(s, w)| < Waiter? + c(w) +Es + 4V% 


< 2Vi + c(w) + [  tavisavi+ ew) 
0 Vu 
and this is smaller than ¢# for ¢ > c(w). 


Now take t>8>e*. Then 1 — (logit)! >, so that if 
1 — (log t)"! < o < 2, the inequality (4) gives 


7-2) PRELIMINARY RESULTS 235 
(t] ,,l/log ¢ 


Is(s, w)| < (248 + cw) +E + 4iioet 
n=1 


n 
a | 
< he +. c(w) +e>d At e< c(w) log ¢. 
n=l 
THEoREM 7-5. If, for |x| < 1, 
f(x) = 2») Anx” 


is regular and Re f(x) < §, then |a,| < 1 forn > 1. 
Proof: Since |f(x)| < |1 — f(z)| for |x| < 1, the function 
f(z) ee eee : 
i90) t2acce eee 
is regular and has modulus at most 1 for |x| <1. But the function 
f(z) 

2) = ————_— 
h@) = 0 —F@) 

is also regular for |z| < 1, and its value at z = 0 is a,; by the maxi- 


mum-modulus principle, its absolute value is at least as large at some 
point on |z| = 1. Since for |z| = 1, 


_ | f@) 
lfi(z)| = 1-{@ 
lax] <1. (5) 


The theorem will therefore be proved if we show that each of the 
functions 


) 


it follows that 


F(x) = ant + Gent? +--- 
fulfills the same hypotheses as f(x) itself. This depends on the fact 
that if 7 = e?**/", then 
—] ; 
Uk n if nk, 
2" a — 1)/(n* — 1) = 0 if ntk. 
We have 


=n>, a,x” = nF, (2”), 


236 THE PRIME NUMBER THEOREM {cHaP. 7 
so that F,,(x) is regular for bi < 1, and for such z, 
Re F,(z") = — :> Re f(n'z) < - ~"S 3 =>. 
THEOREM 7-6. Let R be positive, and suppose that 
flat) = an(z — to)" 
1s regular and Re f(x) < M for |x — xo| < R. Then, forn > 1, 


pile 
lanl <= 


Proof: If Reap = M, then a, = 0 for n> 1, by the maximum- 
modulus principle. 


If Reag < M, put 
f(zo + Rx) — ao. 


g(z) = 2(M — Reap) 
Then g is regular for |z| < 1, 9(0) = 0, and 
Re f(%p + Rx) — Re ay M — Rea 1 


(M — Reap). 


R = SS ee 
e g(@) 2(M— Rea) ~~ 2(M—Rea) 2 
Hence g satisfies the hypotheses of Theorem 7-5, so that 
a,k” 
| 
2(M — Reag)| — " 


and the theorem follows. 


THEorREM 7-7. If f satisfies the hypotheses of Theorem 7-6, and 
0<r<R, then for |z — xo| <7, 


f(@)| < lao| + =—— (\M| + lao) 


2R 
and ie) are (Rk — TS (|| + |aol). 
Proof: We have 


FC < lal + 5 laale™ < lool + 21M + lool) © (Z) 
n=l n=] 


2r 
= |ao| + Ray (|| + |aol), 


7-2| PRELIMINARY RESULTS 237 


and 
Hels E loans s SASS En(G) 
2h 
= py (IMI + lal). 


THroremM 7-8. Let r be positive and M real, and suppose that 
f(so) ¥ O and that, for |s — so| < r, f(s) ts regular and 

f(s) 
F (So) 


Suppose also that f(s) ~ 0 in the semicircular region |s — so| <1, 
Res > Reso. Then 


= Rel (so) < =, 


and if there ts a zero p of f on the open line segment between so — 1/2 
and so, then 


_ Rel jes . 
f r 


So — Pp 

Proof: There is clearly no loss in generality in supposing that 
f(so) = 1 and so = 0. In this case, the hypotheses can be listed as 
follows. 

(1) For |s| <7, f(s) is regular and |f(s)| < e”, where M > 0. 

(2) f(@) = 1. 

(3) f(s) ¥ Ofor|s| < r,¢ > 0. 
We look for an upper bound for — Re f’(0). 

If » runs through the zeros of f in the circle |s| < 7/2, then the 
function 
f(s) 
= Ta = so) 


is regular for |s| <r. On the circle |s| = r, we have 


p= 


p 


Ss 
|-1>1 
p 


so that here jg(s)| < |f(s)| <e“. By the maximum-modulus 
principle, 


238 THE PRIME NUMBER THEOREM [cHaP. 7 
las) < eM, for |s| <5 

Since g(s) ¥ 0 for |s| < r/2, and g(0) = 1, we can write 
g(s) = &, for |s| < 5 


where ¢ is regular and Re G(s) < M,G(0) = 0. By Theorem 7-46, 
with r/2 instead of R, 


e')| <M ==. 

But 
Doge cl x, ee 
7. 7 8) a), 7° 

so that 


= |p| - weiss 


—Re f’(0) — poe < sa 


: 
—Re f’(0) < + Eee 


Since we have supposed that all zeros p have nonpositive real parts, 
the theorem follows. 
If f is regular on the vertical line og + #, and if 
oo tbe 


b 
lim f(s) ds = lm | f(oo + t)2 dt 
b— a 


a> © Joo—ai 
t- <] 


exists, then we abbreviate this limit to 
f(s) ds. 
(co) 
THEOREM 7-9. We have 


1 fy s=)) for0 <y <1, 
Qri Ji) S27 Ss logy = for yy > 1. 


7-2) PRELIMINARY RESULTS 239 


Proof: The integral converges, 2+ ai 


because 
sits : Ley 
s°| 44° 
First suppose that 0 < y < 1. 
Then in the region bounded by 


C; and Cz (see Fig. 7-1), the in- 
tegrand is regular, so that by 


Cauchy’s theorem, 
[ Sat] Sas=o. 
Cc, § C2 § 


But along C2, which is of length za, we have 


FIGURE 7-1 


y* 2 1 
s?7] ~ a?’ 
so that 
y* a 
| 5 ds| < =e 
C2 § a a 
Hence, asa— ©, 
y* 
> ds —> 0, ¥ ds a oD 
C2 § CS” (2) § 


and the result follows. 
Now suppose that y > 1, and that a > 2. Then the pole s = 0 
of the integrand lies in the region bounded by Cy and C3, and since 


ye 1+ slogy + (s* log? y)/2+- 1 logy 


we have by the residue theorem that 


1 y° 1 [4 
— | =ds+— | Sds = logy. 


272 


But along C3, which is of length za, we have 


240 THE PRIME NUMBER THEOREM [cHap. 7 


so that, for a > 4, 


s 2 2 
a 4 
[ Sas < ts 3 al 
c; § ( 2) a 
Hence, asa— o, 
$ $s s 
[ Sa—o, Yds—>| Sas, 
Cc; § Cc, § (2) § 


and the result follows. 


7-3 The Prime Number Theorem. It will be necessary in what 
follows to know something about the location of the zeros of the ¢- 
function. For |é| large, this information is supplied by Theorem 7-13; 
for |é| small, we use only the fact that ¢(s) does not vanish for o > 1. 
Historically, this was the first nontrivial result obtained concerning 
the zeros of the ¢-function. (A trivial fact is that ¢(s) ~ Oforo > 1, 
which follows immediately from the product representation 


¢(s) = Il (l-— py", 


valid for s > 1.) The proof below that ¢(1 + é&) ~ 0 is due to de la 


Vallée Poussin; it may have been suggested by the following consid- 
erations. 


For o > 1 we have 


1 
log ¢(s) = > mpm > : + f(s), 


m,p 


and f is easily seen to be regular for ¢ > 4. Since ¢ has a pole at 
s = 1, with residue 1, it follows that as « > 1, 


(6) 


We now reason heuristically. If 1 + tz is a zero of ¢, and we put 
s = 6+ toi, then as o > 1", 


log |§(s)| ~ log (« — 1) 


o¢—ji- 


1 
Dd —z ~ log 
p P 


and 
Re log ¢(s) — Re f(s) = log |¢(s)| — Ref(s) 
2551008 (to we p) 
P Pp 
Comparing this with (6) we see that for most p, cos (to log p) must 


~ log (o — 1). 


7-3] THE PRIME NUMBER THEOREM 241 


be close to —1. But then cos (2¢9 log p) must usually be nearly 1, and 


= ™~ log 


4) p° o—l 


COS (2tp log p) 


But this requires that ¢ have a pole at 1 + 297, which is not the case. 
To make this argument rigorous, note that for all real 8, 


3 + 4cos 6 + cos 26 = 2(1 + cos 6)” > 0. 
Hence for o > 1, 
log [f° (c)f4(o + tot) f(o + 27)| 
= 3 log _* + 4 log |f(o + toz)| + log |f(o + 2toz)| 


cos (ton log Dp) cos (2ton log p) 


= 3 ae a 4> it 2 ae 
np mp? np 
_ 5 34.00 (on log p) + 008 an og 
ier np”? 
> 0. 
Thus 
g at a rte tf) 


((¢ — 1)g(0))? 


; 1 
IS (o + 2toz)| > ——» 
¢—l 

and if 1 +: to? were a zero of ¢, the left side in this inequality would 
remain bounded as o— 1", while the right side increases without 
limit. 

We now use this technique, together with Theorem 7-8, to show that 
¢(s) does not vanish at any point too close to the line o = 1 and 
sufficiently far from the real axis. 


THEOREM 7-10. Foro > 1, 
‘7 a ‘i? 
Re(—3£ (o) — as (o + tt) — ¢ (o + 24)) >= 0 


Proof: Differentiating the relation 


] = 
og £(s) 2 mpm 
we obtain 
: lo ” A(n) 
Sy = - ph =, (7) 
¢ ™Mm,P Dp n= 1 n 


242 THE PRIME NUMBER THEOREM [cHap. 7 
where 


A(n) = bog p ifn=p™, for any m > 0 and prime p, 
0 otherwise. 


The termwise differentiation is justified because the series for log ¢(s) 
converges uniformly in any region to the right of o = 1. Hence 


t ’ / 
Re (- 3 (c) — se (o + tt) -* (o + 24)) 
. _ + 4n—* + n7**)A (n) 


= Re >> - 
n=1 n 
2 s (3 + 4 cos (tlog n) + cos (2t log n))A(n) 
n=1 n? 
= 0. 


THEOREM 7-11. (a) For o > 3 andt > c, we have |t(s)| < t. 
(b) Fort > 8ando > 1 — (logt)", we have |t(s)| < clogt. 
Proof: For ¢ > 2 and ¢ > 8, 
t, 
Is(s)| < Za 3<2< toy 


For o < 2, both inequalities of the theorem follow from Theorem 
7. 


THEOREM 7-12. Fore > 1, 


where p is the Mobius function. 


Proof: This follows immediately from Theorem 6-12 for s real 
and greater than 1; by analytic continuation, it is correct for ¢ > 1. 


THrorEM 7-13. There are constants c, > 8 and co > 0 such that 
¢(s) ¥ 0 for 


t> ¢, and o>1-—--——- 


Proof: In accordance with Theorem 7-11 (a), choose cz > 8 such 
that 


I¢(s)| < ¢, foro > 43, t> cz. (8) 


7-3] THE PRIME NUMBER THEOREM 243 


i 1 % 
FIGURE 7-2 
Inasmuch as 
2 ae —~-2, forg> et (egteg) 
4 logz log x 


it suffices to show that any zero 6 + yz of ¢ with ¥ sufficiently large 
(in particular, larger than 8) and for which 


B>- tit = 
og 1 
is such that 
log y 
Put 
a = oo(vy) = 1+ 


an 


and suppose that 6 + yi is a zero of ¢ for which y > e**¢2+# and 

8 > 0) —+3. We shall apply Theorem 7-8, once with so = o9 + yt 

and once with so = go + 277. In either os since a9 > 1, we have 

that for y > c3 + %, the circle |s — so| < $ lies in the ‘quadrant 

o >%,t> cs. Sincey > e%, we have ao < 2, and by Theorem 7-12, 
1 1 = 2 

, (%)| = <1+ [s =1+ oo el a log ¥. 


Thus for each ¢, > 0 there is a cs such that for y > cs > c3 + 3, the 
inequality 


a = 


o(s) 
£(so) | — 


<= (2 + =) log y<y'ta 


244 THE PRIME NUMBER THEOREM [cHaP. 7 


holds at every point s of the circular disk |s — so| < 4, since at every 
such point, cg . t<2y+4. If y > cs, we can now apply Theorem 
7-8, with r = 4, f(s) = (3), M = (1+ 4) logy. Using the first 
inequality of that theorem with so = oo + 2y7, we obtain 

/ 


—Ret (bag) Sedan) eee: (9) 


using the second with so = go -+ yz we have 
/ 


1 
ee ee ee ee ag ee 6) 
t 7 —8 


since 
o9 —7T/2=a9 —4 <B<1 <a 


Finally, since oo — . as t—> ©, we have from (6) that for eg > 0, 


aoe 2) Pe). 
— 1 Cc 


for y > c.g. Using the estimates (9), (10), and (11) in Theorem 
7-10 gives 


ares x oo) < (11) 


1 4 
EE ise + 4-811 + 4) logy — —— + 8(11+ 4) logy = 0. 
C4 7 —B8 
This inequality can easily be simplified to 
09 —-B> ua ’ 
log y 
where 
a 4c, 
7 3(1 +e) + 40(1 + 4 )eu 
and this gives 
ns Wee a P 
log 


It is clear that cz > c, if e, < 3 and cy is sufficiently small, and we can 
then take co = c7 — cg and c; = max (Cs, Cg). 
THEOREM 7-14. If 0 < cg < Co, then 
Cg 


log ¢(s)| < log? é for t > ey and o > 1 — 7 


7-3] THE PRIME NUMBER THEOREM 245 


Proof: We use Theorem 7-7, with so = 2 + ¢éo2, for some tp > 8 to 
be determined. For ¢ sufficiently large, the circular region 


(ce + cg) 


— <1 
ls — sof S14 log to 


(12) 


lies entirely in the region described in the preceding theorem, in which 
¢ has no zeros. Hence the function log ¢(s) is regular 1 in this disk, and 
by Theorem 7-11(b), 
Re log ¢(s) = log |f(s)| < log (clog ¢) 
< log (clog (tp + 2)) < ej0 log log to. 
Hence, by Theorem 7-7, we have that for s in the region (12), 
2 - 2(cro log log to + |¢(s0)|) 
Cg — Co 1 
2 log to 
< c+ clog tp log log tp < log? to, 
if é9 is sufficiently large. This inequality holds on the radius extending 
toward the left from so, for every large to, and hence throughout a 
region t > cg, 1 — cg (logt) 1 < o < 2. Fimally, |t(s)| and |1/¢(s)| 
are bounded in the half-plane o > 2, and |log £(s)| is consequently 
smaller than log” ¢ for ¢ large and o > 2. 


THEOREM 7-15. There is a constant a > 0 such thatasz— ~, 


> log = 22/3 ~ ds + O(xe~**'=*), 


pz 
for some c with0 <c <1. 


llog ¢(s)| < [(s0)| + 


Proof: Using Theorem 7-9, we have 


. y AW 2)" as 
if. ee a @ & z log n \n 


= A(n x/n)* A(n) | 1 x 
A(n) ) ( By is => — log ~ — >. — log = 

Qnt nol log n (2) + § n<zlogn  N mp M ~~ Pp 
prc 


mae 


4 
= LY lg-+ X —lg—- 
pxz p me m P 


pW@<Sz 


246 THE PRIME NUMBER THEOREM [cHap. 7 


Cg 
1-4 ui B 
log u 


2+ ur 


| Cip + Cgt 


h 


FIGURE 7-3 


As noted earlier, the number of terms in the last sum is 
a(x?) + r(a*®) +--- Sat tad tee tal < yg? 


where u is the smallest integer such that z1/“ < 2. Thus 


x(a?) + r(a8) +--+ = O(2? log =), 
so that 
2nt > log = -| = log ¢(s) ds + O(xe~VE*), 
pSz P J) s 
since 


> Zise =< <2 log z = O(Wz log? x) = O(ze~V"2"). 
m>2 Mm Pp m>2 
p™Sz pr<z 
We now cut the complex plane along the real axis, extending the 
cut from s = 1 to the left, and examine the function log ¢(s) in the 
cut plane. If Z is the complex conjugate of z, then ¢(5) = ¢(s) and 
log Z = log z, so that log ¢(8) = log ¢(s). Hence, by Theorem 7-13, 
¢(s) ¥0 for |t] > co >c, and o > 1 — ¢g (log |é/)~*. Moreover, 
since ¢(s) does not vanish on the line « = 1, and since its zeros have no 


7-3] THE PRIME NUMBER THEOREM 247 


finite limit point in any half-plane o > oo > O (since ¢(s) is regular 
there), there is a constant c,; > 0 such that ¢(s) ¥ 0 in the rectangle 


l—-er<o<1, || <o. 


Finally, ¢(s) # 0 for 1 <o <2, and the only singularity of the 
function in the half-plane ¢ > 0 is at s = 1. Consequently, for 
arbitrary wu > Co, log ¢(s) is a single-valued analytic function in the 
region Q shown in Fig. 7-3, bounded by the arcs T, T2,..., Te, Tz, 
I'g,..., le, 11. Hence if we denote by I the complete boundary of 
this region (so that we might write symbolically T= 1, + T2+--- 
+ 1T,), we have by Cauchy’s theorem that 


x 
| = log ¢(s) ds = 0. 
rs 
It follows that, if the integrals are taken in the positive direction, 


| = log ¢(s) ds 
(2) § 


2—wue 2+ 034 x 
-([ +f +f +/ ) Flog £6) ds 
Tf, T) 2+us s 


2+ 00 3 x? 
ef ee +f) Stag 
— 04 Tot-++ +Pet+l7+fet+--: +f2 2+ ust s 


We shall show that all the other integrals are small in comparison 
with those along Ig and Ig, if u is sufficiently large. For brevity, put 


v(x, 8) = log s(s). 


By Theorem 7-14, we have that for u > uo(e), 


2+ 04 2+ei 2 
¥(z, s) ds| < | —> |log ¢(s)||ds| 
2+-us e+u |S 
<2 taal #< <=. 
so that 
2+ 03 
lim y(z, s) ds = 0 
uo J2+ut 


The same estimate applies to the integral from 2 — «2 to 2 — ut. 


248 THE PRIME NUMBER THEOREM [cHaP. 7 


Since the length of Ts is less than 2, and the integrand is again 
smaller than z” log” u/u? for u large, we have 


lim | (a, s) ds = 0, 


uo JT 9 


and similar considerations give 


lim | y(x,s) ds = 0. 


u>ro JP 


Along T3 we have s = 1 — cg (log ¢)! + &, so that 


is (log t) —1 
[ ve s) ds <| 3 — los? t 


Now suppose that x, and then u, are chosen large cael that 


ple 


Cy < eV let & yy 
Then 
v(x, s) ds 
Ys 
J Cg 10g zx 2 
— | : ~~ ae ca(2ee log x)? ioe Y di+-2z ° log di 
ps eVialogz §- t3 


log” f log” t di 
= = Seg log z [ * E | 
O (x at + = 5 log z V/ 28 log z t2 


= O(xe~2V8 7) 
where a = V Cg /2. 
By symmetry, 
i ¥(z,s) ds = O(xe~2V®8 7). 
F; 
The paths Ty, I's, [, and I's are of fixed lengths, and on them 
¥(z, 8) = O(n) = o(ze*V™"), 


so that the same estimate holds for the integrals themselves. 


7-3] THE PRIME NUMBER THEOREM 249 


I’; is described by the relations s = 1 + 6e”, |6| < a, where 5 > 0. 
Since (s — 1)¢(s) ~ 1 as s— 1, we have 


Re log ¢(s) = log |¢(s)| ~ — log |s — 1] = — log 4, 
Im log ¢(s) = arg ¢(s) = O(1) 


asé6— Ot. Hence 


[ ve s) ds = 0( 2 = 2 log :) = o(1)- 


Combining all these results, we can take the limit as u—> © and 
§— 0° and obtain 


yité 


1 l—cy 

2rt 2X log — -= ¥(a, s) ds +/ ¥(a,s) ds + o (xe *VE*), 
pSz Pp l1—cy 1 

where the first integral is along the upper edge, and the second along 

the lower edge, of the cut. We know that (1 — s)¢(s) = R(s) is 

regular in the region o > 0, and that it has no zeros in the region 

a > 1 — C4, |ét| < cg. Hence the function 


log ((s — 1)¢(s)) = log (s — 1) + log ¢(s) 


is single-valued in this region; since log (s — 1) has, on the upper 
and lower edges of the cut, values which differ by 272, the same is 
true of log ¢(s), if the difference is taken in reverse order. Hence 
if s* indicates the upper edge of the cut, and s~ the lower edge, then 

1 


1l—cy 
Ha, st) dst + | Hx, 67) ds 


1—cn 


1 ast 1 s* 
=| Geptoertety ast — [AS Gog e(et) — 2) ast 


1—en ( 
= ni [ sia 
1—cy § 
57 . x —avVlog x 
> log - = —= ds + O(ze yi (13) 
pez Pp l—cy $ 


The theorem is proved. 


250 THE PRIME NUMBER THEOREM [cHap. 7 


THEOREM 7-16. Asx— ~, 


r(z) ~[ a + Ole e~hevine), 


Proof: Replace 1 — cy, by C in (13), and put 


56 = 6(x) = g aay ee 


Then since log (1 + 5) ~6 as z— ©, we have 


igo) aS log ~ 


p< 2z(1+8) Pp pez Pp 


1 6 
= Clg i+s+ ¥ pg 
Pp~Z 


z<p<2x(1+8) Pp 
= log (1 + 6)x(x) + O(log (1 + 8) - 6x) 
1 os a 
=| - ((1 + 8)° — 1) ds + O(ze*V"**), 
Cc 
so that 


_ J 'a+6)§ —1 : xe Viet 
r(x) = log GQ 48) [ 2 x* ds + O(6x) + o( 


Now 


a+ 6)§ -—1=s6+ Oe (1 + 85)* 78, 


where 0 < # < 1, so that for 0 < s < 1, 


s|s — 1 
2 


\(1 + 8)§ — 1 — sd] < 5° < 8. 


Thus, making the change of variable 7° = u, we obtain 


1 Ss 1 x 1 ws 
pure er ee 3 “as+0(s | 7; ds) 
Cc s Cc c s 
1 
5f << +0(#] 2* do) 
2c log u 


=f se — + O(6%2). 
» log 


7-3] THE PRIME NUMBER THEOREM 251 


—aVlog 
+ O(6z) + O a= 


Finally, 


a(x) = 


sors tit 5 


= (1+ 0(8)) Le loo y+ O(8#) + O (xe 208 *) 


= : de + O(ér) = { ie + O(xe~22¥ 87), 


2 log u 2 logu 


The Prime Number Theorem is a very weak consequence of Theorem 


7-16, since 
[<- “T+ du x 
2 logu logule Jo log?u logz 
and ne Vez = o( 05 ) 
log x 


for every c > 0. In fact, we see by repeated integration by parts 
that the relation 


x 2!a 3!la mix x 
as log x - log? x v log? x ae, log” x 7" (5) 


holds for every positive integer m. 

The coefficient a occurring in the remainder term in Theorem 7-16 
can easily be bounded explicitly; it can be shown for example that 
a = qs is an allowable value, by choosing ce = yoop, C2 = 3bo> 
Cs = 7o0n 41 = To € = Teo- However, no result of this type is as 
good as the known result that 


* d Nog aloe lore 
x(2) -| U ap O(xe~° og x log ne), 
2 log u 


In a variant of the proof given here, the factor log ¢(s) in the 
integrand is replaced by ¢’(s)/¢(s). The logarithmic singularity at 
s = 1 is then replaced by a simple pole, which makes the analysis 
somewhat less complicated. On the other hand this gives an esti- 
mate not of 


maz) = 21, 
PLZ 
log x 
but of y(r) = a log p Fe =| ’ 


and an additional step is needed to obtain the final result. 


252 THE PRIME NUMBER THEOREM [cHAP, 7 


7-4 Extension to primes in an arithmetic progression. For rela- 
tively prime integers k and lI, let x(x; k, 1) be the number of primes 
p = 1 (mod k) which do not exceed x. For given k, there are y(k) = h 
choices of / which are distinct modulo k, so that if the primes are more 
or less evenly dispersed among the various progressions, it is to be 
expected that 

x 


; log x 


a(x; k,l) ~ 


It is the object of this section to show that this is the case, and in fact 
to obtain an estimate for r(x; k, l) similar to that given in Theorem 
7-16 for r(x) = x(x; 1,0). Several proofs will not be given in full 
detail since they are similar to those of the preceding sections. As in 
the preceding chapter, we isolate the primes lying in a given arith- 
metic progression by use of characters and L-functions. The L-func- 
tions are in turn simple combinations of Hurwitz ¢-functions, as the 
following theorem shows. 


THEOREM 7-17. Foro > 1, 
1 k 
L(s,x) = 7, zy x(a)f (s, ‘). (14) 
Proof: Since x is periodic of period k, 
~ x(n 
Ls, x) = 5 


1 
= . x(a) = (em + a)" 


= 7 E x@r(s,2). 


The domain of validity of (14) can be extended somewhat. If we 
put 


le I i x = x0, 
E(x) = \ if x ¥ xo, 


then the first equation of Theorem 6-6 becomes 


k 
Pa x(a) = E(x)h. 


7—4] PRIMES IN A PROGRESSION 253 
Hence, for « > 1, 


E 
Lis, x) — =". + 


et REO (: aaa) 


By Theorem 7-3, each summand on the right is regular for o > 0, and 


1 (= ;) == 
s—-1\k5 k/ k(s—1) 


is an integral function. By analytic continuation, we have 


THEOREM 7-18. The relation (14) holds for « > 0, except at s = 1. 
Moreover, 


hE (x) © 
k 


Hence L(s, x) ts regular for o > 0, except that L(s, xo) has a simple 
pole ats = 1. 


For « > 2 and é > 8, 
Le < 5 5<2<{' 


lim (s — 1)L(s, x) = (15) 
s—1 


t, 
log é, 


while for co > 0 andi > 0, 
k 
IL(s, x)| < >> r(« ‘)) 
a=1 


so that Theorem 7-4 yields 


TurorEM 7-19. (a) Foro > 5 andt> cy2(k), we have |L(s, x)| <t. 
(b) For t>8 and o>1-— (ogt)!, we have |L(s, x)| < 
C13 (k) log t. 


The proof of the nonvanishing of {(s) on o = 1 can be generalized 
in a simple way. 


THEOREM 7-20. L(s, x) does not vanish on the line o = 1. 
Proof: For « > 1, 
L(s, x) = ii (1 — x(p)p7*)4, 


254 THE PRIME NUMBER THEOREM [cHap. 7 


so that we can choose 


log L(s, x) = 2 


m,p Mp 


x(p™) 


Ms 


Hence 
log |L3 (oc, xo) L*(o + tt, x)L(o + 2ti, x’)| 


= 3 log |L(o, xo)| + 4 log |L(o + ti, x)| + log |L(o + 2t#, x”)| 
= 3 log L(c, xo) + 4 Re log L(o + ti, x) + Re log L(o + 2ti, x”) 


3 ) 4x (p™) x” (p”) 
=z x saa + Re e apmertn + Re- RTD 
3+ ‘ cos (n(p™) — tlog p) + cos 2(n(p™) — t log p”) 
ge ee 
m,D P 
ptk 
>= 0, 


where x(p”) = e””), Thus 
L ti, x) |* 1 
((@ = 1)L(6, x0) || [Ge + BH, A) = —> 


and the falsity of the theorem would contradict Theorem 7-18. 
By now the proof of the following analog of Theorem 7-10 should 
be a simple exercise for the reader. 


TororeMm 7-21. Foro > 1, 
/ 7 TT; 
ee? (7, x0) — 4 Re (o + tt, x) — Re (o + 2ié2, x”) = 0. 


Theorem 7-13 becomes 


THEOREM 7-22. There is a c,(k) => 8 such that L(s, x) #0 for 
t > c(k) ando > 1 — ce/logt. 


The only difference in the proofs is that now the first inequality of 
Theorem 7-8 is applied with f(s) = L(s, x”) and so = o + 2ti, while 
the second is applied with f(s) = L(s, x) and s = o + é#. Also, the 
constants now depend on k. After these trivial modifications, the 
proofs are identical. 


Similarly, replacing ¢(s) by L(s, x) throughout, Theorem 7-14 
becomes 


7-4] PRIMES IN A PROGRESSION 255 


THEOREM 7-23. For t> 9(k) >8 and o>1 — cg (logt)“, 

llog L(s, x)| < log? ¢. 

The constant cg(k) may be different from the cg of Theorem 7-14; 
the subscript is retained to facilitate reference to Fig. 7-3. In the 
same way, C,; becomes ¢,;(k). 

Instead of proceeding directly to the analog of Theorem 7-15, it is 
convenient to break the argument into two steps. 


THEOREM 7-24. For (k,l) = 1, we have 


> log 
PSz 
Pp sz 1 (mod k) 
Proof: Using Theorem 7-9 and the series expansion for log L(s, x), 
we obtain 


x 1 1 2° 
; = oxy | Bete x) ds + O(Vz log? zx). 


1 x* ] x* _ x(p™) 
Dri a e: og (s, x) ds Ori (2) Red x mp™ 
1 5 xO") [iP 4 
_ 2 
2x2 mp ™m J 2) $s 
-> x(p™) log (x/p™) 


m,P m 
Pp" Sz 


_ x x(p™) log (x/p™) 
= 2 x(P) log js “F rs mn 


= 5 x@) log = + O(Vz log? 2). 
$2 


Multiplying by 1/x(/) and summing over all characters modulo k, we 
deduce with the help of Theorem 6-7 that 


1 x x 
wm Lx(p)log-=h DY log- 
x x (1) ps2 Pp PSz 

ptk p=l (mod k) 

1 1 fo 

=r | — log L(s, x) ds + O(Vz log 2z), 
which is the theorem. (Here and throughout the remainder of this 
section, the implied constant in the O-symbol may depend on k.) 


256 THE PRIME NUMBER THEOREM [cuap. 7 


To estimate the integrals appearing in Theorem 7-24, we must 
distinguish two cases. First consider the case x = xo. Every prop- 
erty of the integrand which was used in estimating. 


> 
—5 log ¢(s) ds 
(2) § 
carries over to the integrand of 
s 
| =. log L(s, xo) ds. 
(2) § 


It follows that for suitable c with 0 < c < 1, 


$ 1s 
= log L(s, xo) ds = ani | ds + O(xe~2V"2*), 
(2) c 


On the other hand, if x ¥ xo, then L(s, x) has no pole at s = 1, but 
the other properties used earlier still obtain. Hence, if we do not 
cut the plane, but consider the line segments I; and I's in Fig. 7-3 
as a single segment Ig, and omit Ig, T'7, and I'g, then the function 


W(s, x) = Glog L(s, x) 


is regular in the region bounded by T1, To, Ts, Ta, T's, Pa, I's, Pe, Ts, 
so that 


—Ut 2-+ 07 
¥(8,x) ds = ( [ -{ +[ )¥x0¢s, 
(2) — 0F Pots +P4+-Ts+l4+P3+Te 2 +s 


Moreover, the integral along each of these new arcs either tends to 
zero Or iS 
OGe ="), 


It follows that : 
1 Sd Tere 
> log — =  [ 2 ; : + O(xe~ “Vv 7), (16) 
p<z D h c Ss 
emi Gnod k) 


which is the analog of Theorem 7-15. In exactly the same way as 
Theorem 7—16 was deduced from Theorem 7-15, equation (16) leads 
to the desired result: 


THeoreM 7-25. If k is a fixed integer and (k,l) = 1, then, as 


i Pee oo, 


—leviog = 
a(x;k,l) = lz fogas F Ole iy 


7-5] SUMS OF SQUARES 257 


As consequences of Theorem 7-25, we have that 


1 x 
a(x; k, l) ~ O(k) log z ’ 
and that, if (k,1l,) = (k, lz.) = 1, then 
-k,l 
lim GZ TY 1, 


z— co a(x; k, ly) 7 


so that asymptotically there are equally many primes in the progres- 
sions kt + 1, and ké + lo. 

A serious drawback of Theorem 7-25 is that the error term is not 
uniform in k. This precludes applying this version of the theorem 
to problems in which k increases with x, and these unfortunately 
are among the most important applications of this kind of theorem. 
It is known that the error term in Theorem 7-25 is uniform in k for 
k < log” x for some m > 0, in other words, that the relation dis- 
played in the theorem can be used if k increases sufficiently slowly 
with z. The proof of the more general theorem, while similar to that 
given here, is more complicated. The chief difficulty is this: when 
dealing with fixed k, it is enough to prove that L/(s, x) #0 for 
s = 1+ & in order to deduce that for some c,; (kK), L(s, x) ¥ 0 for 
1 — oy <o <1, |é| < cg(k). When k increases, however, c,; might 
tend to zero quite rapidly as a function of k, in which case the integral 
along I's would not be negligible. It is therefore necessary to investi- 


gate further the zeros of the L-functions near the line o = 1 for 
small |é). 


7-5 The integers representable as a sum of two squares. As a 
final illustration of the methods of this chapter, we shall obtain an 
asymptotic estimate for B(x), the number of integers not exceeding 
xz which can be written as a sum of two squares. The integers counted 
are exactly those in whose prime-power factorization the primes 
r = 3 (mod 4) occur only to even powers.* The following heuristic 
argument indicates that it is to be expected that B(x) is of the order 
of magnitude of z/V log x, which is in agreement with the result to 
be obtained. 

Take x very large. Since one out of every p integers is divisible 
by p, the number of integers up to z not divisible by p is about 


* Cf. Volume I, Theorem 7-3. 


258 THE PRIME NUMBER THEOREM [cHar. 7 


z(1—1/p). Hence the number not divisible by any p < Vz is 
roughly 


so that, by the Prime Number Theorem, 


II (1 <= *) a ad ’ 
paiva P log x 
where the symbol “‘~” means “‘is probably of the order of magni- 
tude of.”” To count the integers contributing to B(x), we do not 
want to eliminate all composite numbers, but only those divisible 
by an odd power of any of the various primes r = 3 (mod 4). As in 
the cross-classification principle,* we can omit all those divisible 
by r, then reintroduce those divisible by r”, then take out those 
divisible by r’, etc., giving 


He M-3)- 


as the number left after accounting for the one primer. (The prod- 
uct has only finitely many factors.) Hence 


er “(1 7 at (1 a =) 3 


and since each product after the first converges as tr — ©, we can 
write simply 


1 
Biz) =~ z I] 1-7), 
rSVvz r 
Now 


1 1 
log II (1 - 2) = > log (1-2) = — tog log 
PSva Pp PS Vz Pp 


and since, by the results of the preceding section, about half the 
p’s are r’s, we have 


1 1 
log I] (1 -*) — 5 log log z = — log Vlog z, 


rivz 


* Cf. Volume I, Theorem 6-4. 


7-5] SUMS OF SQUARES 259 


so that B(z) 2 
n) = : 
V log x 
Probably the most that can be said for this argument is that after 


seeing it, the reader should not be very surprised to learn that, for 
some b > 0, 


bx 
Biz) ~ —=—— (17) 
Vlog x 
Nevertheless, it is just this type of reasoning which underlies the 
proof of (17) which will now be developed. 
If we put 


ee ifn = x? + y* for some integers 2, y, 
"— |0 otherwise, 


then 
B(x) = DX da. 
n<z 
For o > 1 let 
2 Dx. 
f(s) = 


the series converges absolutely in this domain, and uniformly in any 
closed bounded region to the right of the line o = 1. Using g and r to 
denote primes congruent to 1 and 3 (mod 4) respectively, we deduce 
from the definition of 6,, and Theorem 6-3 that, for o > 1, 


fo) -(1+ 5+ gt )M(+5+at--) 


1 1 
KW(+a+at--) 
. r r 
=(-2 7 Ma-@ytda-ry. 
qd T 
As was pointed out in formula (5) of the preceding chapter, 
c(s)L(s) = 1-2°)"*Wda-@)Prda-r*y*, (18g) 
q rT 


where L(s) = L(s, x) is the L-function for the nonprincipal character 
(mod 4), 


0 if 
x(n) = ase aoe 


260 THE PRIME NUMBER THEOREM [cHaP. 7 


(The relation (18) was proved earlier only for s > 1 and real; the 
extension to the half-plane o > 1 is immediate.) Hence, for o > 1, 


f(s) = (1 —-2°)7 Td — ry *¢(8)L(s). (19) 


Since L is regular for ¢ > 0 and 


1 1 7 
Ea 24 So ee nose 
a) 3° 5 4° 
the function ¢Z has a simple pole at s = 1, with residue 2/4, but is 
otherwise regular for « > 0. Moreover, neither {(s) nor L(s) is 
zero for s in the region Q of Fig. 7-3, for suitable positive c,; and Cg. 
Since the functions 


(a — 2-8)! and II a — ry"! (20) 


are regular and different from zero for ¢ > 34, and bounded in abso- 
lute value for « > oo > $, we deduce the following properties of f 
from known properties of ¢ and L. 


THEOREM 7-26. (a) f7(s) is regular and different from zero in 
the region Q of Fig. 7-8, for suztable c,, and co, and it has a simple 
pole ats = 1, with residue 


5 (1 — 2)! = 0, (21) 


Hence f is also regular in Q, and f?(s)-(s — 1) is regular in the 
uncut region Q’ formed from Q by omitting Ts, 17, and T., and 
joining T's and Irs. 

(b) For |t] > 8 and sin Q, the inequality |f(s)| < cy4 log |t| holds 
(cf. Theorem 7-11 (b)). 
From this follows the usual consequence. 


THEOREM 7-27. For swiable c < 1, 
x 1 fia ey, or 
> 6, log- = — = f(s) ds + O(xe ae F 
nix n Wt Je (§ 


Proof: The proof follows the lines of that of Theorem 7-15 as 
regards changing the path of integration in the relation 


x 1 x* 
oD by log — = sa | phe a 


nia 


and estimating the new integrals along those paths which are 


7-5] SUMS OF SQUARES 261 


bounded away from s = 1; the only change is that the estimate 
If(s)| < cy, log |¢| is used rather than |log ¢(s)| < log? |é|. Omitting 
the tiresome details, we arrive at the relation 


E ba log = = 5 (-[ -| — |) 310) ds + Oe), 
n=l n 2nt Ff Jy Jr/ § 


In the neighborhood of s = 1, f(s) has the expansion 


b 
s) = +-°:, 
Hn Vs —1 

with basin (21). Here Vs —1>Ofors > 1. Puttings = 1 + de”, 
we have | 

x? git 1 

=f (s) ds = — - 2x5] = o(1 

[er (ep 5)? V5 *) " 

as 6 — 0. 


Since f?(s)(s — 1) is single-valued in Q’, the quantity 2 arg f(s) + 
arg (s — 1) is unchanged by traversing a path in Q from [ to Ig. 
Since arg (s — 1) increases by 2z, arg f(s) decreases by 7, so that f(s) 
has opposite signs on the two edges of the cut. Hence 


[ Bre a+ [ Srey as = 2f Ss) ds, 


z 1 $ 

and > bn log — -= 2 = f(s) ds + O(ae~2Vbe 2) 
n=1 WT 1—o, § 

The proof is complete. 


THEOREM 7-28. Asxz— o, 


Bz x 
eae tila a) 


1 —4 
where B- (1-5) 
Proof: On T'g we have 
f(s) ba 
AE Ovi = 
2 Vizs(l_-a—s)? aa) 
ee + O(V1 — s) 


262 THE PRIME NUMBER THEOREM [cHaP. 7 


as s— 17, so that 


zy bn log = 
1 


1 x 
= d +0([ wvT—sds) + 0( ) 
ae aa — log? x 
“2 [oetaco( eva) «ol 
wT Jo 0 log? x 
2 ba ll e~tlog 2, ; —} du + o(s [ et logz ut du =i 0( 3 -) 
Tv JO log* 


br fulez ( v )° dv er ( v ) dy 
=— e"{——}] —-4+0\2 6 
t Jo log x log x 0 log x/ log xz 


b cy log x 
oes [ ey? dy + O ( = 
rV log x Jo log? x 


Jaga? G)-[vt#) + 065s) 
= ——| r(-)- ey dv ) + O(—— 
T =_( 2 cn log z log? x 
- Helve+0(Les)}+ ol) 

rV log x cn log x log? t 


= x Br x 
Thus >, b, log - = +0( , , 
n=1 n Viog x log? x 
b 1 
where B=—- =—JI(1- ry. 
V V/2 r 
Now let 6 = 65(z) be positive. Then 
x+ézr z 
Pa Dn log += — x by log — 


x +éz 
= log (1 + 8) © by + a On log — sR B 


= log (1 + 6)B(z) + O (log (1+ - . 8x), 


7-5] SUMS OF SQUARES 263 


while 
Br (1 +8) - eB (= -1) 
Vlog (x +62) Vlogx WVlogx \Wlog (x + 6r)/log x 


_ Be ( 1+s -1) 
Vlog x \V1 + log (1 + 6)/log x 


_ _ bt (I 2 1) 
Vlog x \l + O(6/log x) 
Bx 5 + O(6/log x) 
~ -Viog 2 ( + O(6/log t 
Bx 


= 0O(6/l 
is (5 + O(8/log x)). 


Hence, since log (1 + 6) = 6 + O(6") asi — 0, 


Bx 7) 1 
B Se oe 
i rng i (1 + 6) ae Cer 3) eae 0 (a log? i) 


nea =(1+00)) +0(—% 


Choosing 6(x) = log” tz, we obtain 


B 
- *_ + 0(-4-) +0/( - ) 
Vlog x logé x log? x 


-) + 0(62) +0(—%, 


B(a) 


and the proof is complete. 


SUPPLEMENTARY READING 


Chapter 1 


Dickson, L. E., Introduction to the Theory of Numbers, Chicago: University 
of Chicago Press, 1929. 

Dickson, L. E., Modern Elementary Theory of Numbers, Chicago, Univer- 
sity of Chicago Press, 1939. 

Forp, L. R., An Introduction to the Theory of Automorphic Functions, 
London: ‘George Bell & Sons, Ltd., 1915. Reprinted, Chelsea Publishing 
Company, New York, 1951. 

Jones, B. W., The Arithmetic Theory of Quadratic Forms, Carus Mathe- 
matical Monograph #10, Buffalo, N.Y.: Mathematical Association of 
America, 1950. Distributed by John Wiley & Sons, Inc., New York. 

KueIn, F., Vorlesungen iiber die Theorie der Elliptischen Modulfunktionen, 
Leipzig: Teubner Verlagsgesellschaft, 1890-1892. 


Chapter 2 


Hecke, E., Vorlesungen iiber die Theorie der Algebraischen Zahlen, Leipzig: 
Akademische Verlagsgesellschaft m.b.H., 1923. Reprinted, Chelsea 
Publishing Company, New York, 1948. 

LANDAU, E., Vorlesungen iiber Zahlentheorie, vol. 3, Leipzig: S. Hirzel 
Verlag, 1927. Reprinted, Chelsea Publishing Company, New York, 
1947. 

PotuarD, H., The Theory of Algebraic Numbers, Carus Mathematical 
Monograph #9, Buffalo, N.Y.: Mathematical Association of America, 
1950. Distributed by John Wiley & Sons, Inc., New York. 

Reip, L. W., Elements of the Theory of Algebraic Numbers, New York: 
The Macmillan Company, 1910. 

Wevt, H., Algebraic Theory of Numbers, Annals of Mathematics Studies, 
#1, Princeton: Princeton University Press, 1940. 


Chapter 3 


LANDAU, E., Vorlesungen tiber Zahlentheorie, vol. 3. 
MorpeEL., L. J., Three Lectures on Fermat's Last Theorem, New York: 
Cambridge University Press, 1921. 
265 


266 SUPPLEMENTARY READING 


Chapter 5 


GELronp, A. O., The Approximation of Algebraic Numbers by Algebraic Num- 
bers and the Theory of Transcendental Numbers, American Mathematical 
Society Translation #65, Providence: American Mathematical Society, 
1952. Translated from Uspekhi Matematicheskikh Nauk (Moscow) 4, 
no. 4 (32), 19-49 (1949). 

Koxsma, J. F., Diophantische Approximationen, Berlin: Springer-Verlag 
OHG, 1936. (Ergebnisse der Mathematik, vol. 4, no. 4.) Reprinted, 
Chelsea Publishing Company, New York, 1951. 

Perron, O., Irrationalzahlen, 2nd edition, Berlin: Walter De Gruyter 
& Co., 1929. 

SreGEL, C. L., Transcendental Numbers, Annals of Mathematics Studies, 
#16, Princeton: Princeton University Press, 1949. 


Chapter 6 


Hasseg, H., Vorlesungen tiber Zahlentheorie, Berlin: Springer-Verlag OHG, 
1950. 


Lanpbau, E., Vorlesungen tiber Zahlentheorie, vol. 1. 


Chapter 7 


ESTERMANN, T., Introduction to Modern Prime Number Theory, Cambridge 
Tracts, #41, New York: Cambridge University Press, 1952. 

IncHaM, A. E., The Distribution of Prume Numbers, Cambridge Tracts, 
#30, New York: Cambridge University Press, 1932. 

LanpDav, E., Vorlesungen tiber Zahlentheorie, vol. 2. 

LaNnpDaAU, E., Handbuch der Lehre von der Vertetlung der Primzahlen, Leipzig: 
Teubner Verlagsgesellschaft, 1909. Reprinted, Chelsea Publishing 
Company, New York, 1953. 

LaNnbDAvU, E., Einfithrung in die Elementare und Analytische Theorie der 
Algebraischen Zahlen und der Ideale, Leipzig: Teubner Verlagsgesell- 
schaft, 1918. Reprinted, Chelsea Publishing Company, New York, 1949. 


LIST OF SYMBOLS 


I, unimodular group, 8 

[a, b, cl], quadratic form, 15 
I'4(f), group of automorphs, 25 
R, rational field, 38 

R{z], polynomials over R, 38 

deg p, degree of a polynomial, 38 
R(&#), algebraic number field, 42 
Z, rational integers, 48 

R{s], integral domain, 48 

N, norm, 48, 68 

|, divides, 53, 65 

S, trace, 73 

[1, 75, 124 

~, equivalent ideals, 82 

K,, cyclotomic field, 85 

m, prime in Kz, 85 

|, exactly divides, 111 

|| |], 124 

H, height of polynomial, 124 

M (é), Markov’s constant, 166 
¢(s), Riemann’s function, 201 
M(k), group of residues prime to k, 207 
h, o(k), 207 

x(a), character, 210 

X (k), group of characters, 211 
L(s, x), Dirichlet’s function, 214 
¢(s, w), Hurwitz’ function, 232 
Q, region of integration, 247 

a(x; k,l) number of primes p = | (mod k) with p < z, 252. 


267 


INDEX 


A-number, 171 
associate, 53 
automorph, 18 


Barnes, E. S., 81 

basis, integral, 50 
of a field, 50 
of a group of units, 75 
of an ideal, 59 
of a pure cubic field, 105 
of K,, 87 
of Ri/d], 54 


Cantor, G., 166 

Catalan, E., 154, 160 

character, 210 

class number, 83 

completely multiplicative, 203 
congruence, modulo an ideal, 67 
conjugate algebraic numbers, 40 


Dedekind, R., 72, 105, 120 
Delaunay, B., 112, 120 
Dirichlet, P. L., 75, 201 
Dirichlet series, 201 
discriminant, of a field, 52 
of an ideal, 61 
of a quadratic form, 4 
of a set of algebraic numbers, 49 
of K,, 86 
of R[-/d], 85 
domain, Euclidean, 56 
integral, 48 
unique factorization, 56 
Dyson, F. J., 123, 160 


Eisenstein’s irreducibility criterion, 
46, 67 


equivalent ideals, 82 
equivalent points, 9 
Euler’s constant, 161 
extension, algebraic, 44 


Fermat’s conjecture, 93 
field, 41 
algebraic number, 42 
cyclotomic, 85 
pure cubic, 104 
field conjugate, 43 
fundamental region, 9 
of ', 9 
of T',4 f ); 26 
Fundamental Theorem of Algebra, 
35 


Gauss, C. F., 63 

Gelfond, A. O., 188, 198, 200 

greatest common divisor, of ideals, 
65 

group, 6 


Hadamard, J., 229 

height, of an algebraic number, 124 

Hilbert-Gelfond-Schneider theorem, 
198 

Hille, E., 200 

Hurwitz, A., 63, 121 

Hurwitz ¢-function, 232 


ideal, 58 
prime. 66 
principal, 58 
index of a polynomial, 135 
Inkeri, K., 81 
integer, algebraic, 47 
rational, 47 


269 


270 


irrationality, of e, 162 
of 7, 163 


Kummer, E., 97 


law of quadratic reciprocity, 92 
Lehmer, D. H. and E., 1038, 120 
LeVeque, W. J., 172 

Liouville, J., 121, 160, 165, 200 
Liouville number, 165 
Liouville’s theorem, 121 


Mahler, K., 155, 160, 171, 174, 
200 
matrix, 2 
of a quadratic form, 4 
measure of transcendence, 170 
for e, 186 
modular group, 8. 
Mordell, L. J., 120, 155 


Nagell, T., 112, 120, 229 

Newman, M., 157, 160 

Niven, I., 163, 200 

norm, of an algebraic number, 48 
of an ideal, 68 

number, algebraic, 39 


Oblath, R., 160 


Pell’s equation, 25, 55, 74, 154 
period of reduced forms, 31 
Polya, G., 187, 200 
polynomial, monic, 38 
primary, 88 
prime, algebraic, 55 
regular, 97 
primitive element, 44 
product, of determinants, 35 
of ideals, 62 


INDEX 


proper representation, 19 


quadratic form, 1 
definite, 15 
equivalent, 2 
indefinite, 22 
integral, 18 
primitive, 21 
reduced, 5, 16, 23 


representative of a form, 16 
residue class (mod A), 67 
Riemann ¢-function, 201 
roots of unity, 75, 85 
Rosser, J. B., 103 

Roth, K. F., 123, 160 


S-number, 171 

Schneider, T., 123, 160, 187, 198, 
200 

Siegel, C. L., 123, 160, 198, 200 

Swinnerton-Dyer, H.P.F., 81 

Symmetric Function Theorem, 35 


T-number, 171 
Thue, A., 122, 160 
Thue-Siegel-Roth Theorem, 148 
trace, 73 
transcendence, of e, 186, 199 

of +, 186 
transcendental number, 165 


U-number, 171 
unit of R[F], 53 


Vandiver, H.8., 103, 120 
Varnavides, P., 81 


Wronskian, 128 
generalized, 129 


page 


23 


23 


25 


27 


33 


35 


35 


39 


36 


44 


45 


54 


line 


-11 


12 

-3 
8,9,11,13 
17 


—14 


replace 


y” 


lyz — | 


—b/a 


(m2) 


4 (to + uoWD)” 


z=ax2t+ty 


(cf. Problem 3, 
Section 1-10). 


In 1S 


P(z1, ..., fn). In 


n, 
every polynomial 
yp 
C2 


2b = 1(mod 4) 
271 


Errata in Topics in Number Theory, Vol. IT 


with 


z=at+ty 


delete this 
Zn, and of degree g in 7}, is 


P(a1, ..., Ln). It is of total 
degree g in the former. In 


n, and leading coefficient 1, 
every nonconstant polynomial 
y 

Gi 


2b = 1(mod 2) 


272 
69 


83 


87 


125 


129 


151 


165 


168 


169 


206 


223 


228 


236 


238 


242 


243 


243 


—10 


ERRATA, VOL. II 


are distinct ideals 
a field basis 
l<r<p-1 

a (t + 1)th 
dependent 

mr (1+) 


3-19 

j=2 

[Qu + 2] 

(6) 

6-17 
converges to 
principle. 


the line 


add new line 


add at end of line 


et(l+c2 +c4) 


are ideals 

an integral basis 
l<r<p-lrFs 
a primitive (t + 1)th 


dependent over K 


6-16 


converges uniformly for 

o > 01 > Oo, and hence to 

principle applied to ef). 
oo+az 


lim f(s) ds = 


co—ai 
yu 


lim / f(oo + ti)idt 


a-> 


Clearly we may suppose c2 < 1. 
Suppose that c, > 0. 


and hence for z > e4#(+¢4), 


et(it+ea) 


243 


244 


245 


245 


250 


ERRATA, VOL. II 


1 2 
es a | 
% 9% 


max(Cs, C6). 
the region (12), 


Cg — C2 


+ O(6*z). 


1 2 
Oo-—l oo—1 


max (cs,c6, e404) ) . 
the region |s — so| < 1+ 


C2 — Cg 


+ O(2°) + O(6?z). 


273 


C8 


log to’ 


A CATALOG OF SELECTED 
DOVER BOOKS 


IN SCIENCE AND MATHEMATICS 


CATALOG OF DOVER BOOKS 


Astronomy 


BURNHAM’S CELESTIAL HANDBOOK, Robert Burnham, Jr. Thorough guide 

to the stars beyond our solar system. Exhaustive treatment. Alphabetical by constel- 

lation: Andromeda to Cetus in Vol. 1; Chamaeleon to Orion in Vol. 2; and Pavo to 
Vulpecula in Vol. 3. Hundreds of illustrations. Index in Vol. 3. 2,000pp. 6% x 914. 

Vol. I: 0-486-23567-X 

Vol. II: 0-486-23568-8 

Vol. III: 0-486-23673-0 


EXPLORING THE MOON THROUGH BINOCULARS AND SMALL TELE- 
SCOPES, Ernest H. Cherrington, Jr. Informative, profusely illustrated guide to locat- 
ing and identifying craters, rills, seas, mountains, other lunar features. Newly revised 
and updated with special section of new photos. Over 100 photos and diagrams. 
240pp. 8% x 11. 0-486-24491-1 


THE EXTRATERRESTRIAL LIFE DEBATE, 1750-1900, Michael J. Crowe. First 
detailed, scholarly study in English of the many ideas that developed from 1750 to 
1900 regarding the existence of intelligent extraterrestrial life. Examines ideas of 
Kant, Herschel, Voltaire, Percival Lowell, many other scientists and thinkers. 16 illus- 
trations. 704pp. 5% x 84. 0-486-40675-X 


THEORIES OF THE WORLD FROM ANTIQUITY TO THE COPERNICAN 
REVOLUTION, Michael J. Crowe. Newly revised edition of an accessible, enlight- 
ening book recreates the change from an earth-centered to a sun-centered concep- 
tion of the solar system. 242pp. 5% x 84. 0-486-41444-2 


A HISTORY OF ASTRONOMY, A. Pannekoek. Well-balanced, carefully reasoned 

study covers such topics as Ptolemaic theory, work of Copernicus, Kepler, Newton, 

Eddington’s work on stars, much more. Illustrated. References. 521pp. 5% x 8%. 
0-486-65994-] 


A COMPLETE MANUAL OF AMATEUR ASTRONOMY: TOOLS AND 
TECHNIQUES FOR ASTRONOMICAL OBSERVATIONS, P. Clay Sherrod 
with Thomas L. Koed. Concise, highly readable book discusses: selecting, setting up 
and maintaining a telescope; amateur studies of the sun; lunar topography and occul- 
tations; observations of Mars, Jupiter, Saturn, the minor planets and the stars; an 
introduction to photoelectric photometry; more. 1981 ed. 124 figures. 25 halftones. 
37 tables. 335pp. 64 x 914. 0-486-40675-X 


AMATEUR ASTRONOMER’S HANDBOOK, J. B. Sidgwick. Timeless, compre- 
hensive coverage of telescopes, mirrors, lenses, mountings, telescope drives, microm- 
eters, spectroscopes, more. 189 illustrations. 576pp. 5% x 84. (Available in U.S. only.) 

0-486-24034-7 


STARS AND RELATIVITY, Ya. B. Zel’dovich and I. D. Novikov. Vol. 1 of Relativistic 
Astrophysics by famed Russian scientists. General relativity, properties of matter under 
astrophysical conditions, stars, and stellar systems. Deep physical insights, clear pre- 
sentation. 1971 edition. References. 544pp. 5% x 84. 0-486-69424-0 


CATALOG OF DOVER BOOKS 


Chemistry 


THE SCEPTICAL CHYMIST: THE CLASSIC 1661 TEXT, Robert Boyle. Boyle 

defines the term “element,” asserting that all natural phenomena can be explained by 

the motion and organization of primary particles. 1911 ed. viiit232pp. 5% x 84. 
0-486-42825-7 


RADIOACTIVE SUBSTANCES, Marie Curie. Here is the celebrated scientist’s 
doctoral thesis, the prelude to her receipt of the 1903 Nobel Prize. Curie discusses 
establishing atomic character of radioactivity found in compounds of uranium and 
thorium; extraction from pitchblende of polonium and radium; isolation of pure radi- 
um chloride; determination of atomic weight of radium; plus electric, photographic, 
luminous, heat, color effects of radioactivity. iit+t94pp. 5% x 84. 0-486-42550-9 


CHEMICAL MAGIC, Leonard A. Ford. Second Edition, Revised by E. Winston 
Grundmeier. Over 100 unusual stunts demonstrating cold fire, dust explosions, 
much more. Text explains scientific principles and stresses safety precautions. 
128pp. 5% x 84. 0-486-67628-5 


THE DEVELOPMENT OF MODERN CHEMISTRY, Aaron J. Ihde. Authorita- 
tive history of chemistry from ancient Greek theory to 20th-century innovation. 
Covers major chemists and their discoveries. 209 illustrations. 14 tables. 
Bibliographies. Indices. Appendices. 851pp. 5% x 8'4. 0-486-64235-6 


CATALYSIS IN CHEMISTRY AND ENZYMOLOGY, William P. Jencks. 

Exceptionally clear coverage of mechanisms for catalysis, forces in aqueous solution, 

carbonyl- and acyl-group reactions, practical kinetics, more. 864pp. 5% x 8/4. 
0-486-65460-5 


ELEMENTS OF CHEMISTRY, Antoine Lavoisier. Monumental classic by founder 
of modern chemistry in remarkable reprint of rare 1790 Kerr translation. A must for 
every student of chemistry or the history of science. 539pp. 5% x 8%. 0-486-64624-6 


THE HISTORICAL BACKGROUND OF CHEMISTRY, Henry M. Leicester. 
Evolution of ideas, not individual biography. Concentrates on formulation of a coher- 
ent set of chemical laws. 260pp. 5% x 8’. 0-486-61053-5 


A SHORT HISTORY OF CHEMISTRY, J. R. Partington. Classic exposition 
explores origins of chemistry, alchemy, early medical chemistry, nature of atmos- 
phere, theory of valency, laws and structure of atomic theory, much more. 428pp. 
5% x 8%. (Available in U.S. only.) 0-486-65977-1 


GENERAL CHEMISTRY, Linus Pauling. Revised 3rd edition of classic first-year 
text by Nobel laureate. Atomic and molecular structure, quantum mechanics, statis- 
tical mechanics, thermodynamics correlated with descriptive chemistry. Problems. 
992pp. 5% x 8's. 0-486-65622-5 


FROM ALCHEMY TO CHEMISTRY, John Read. Broad, humanistic treatment 
focuses on great figures of chemistry and ideas that revolutionized the science. 50 
illustrations. 240pp. 5% x 8%. 0-486-28690-8 


CATALOG OF DOVER BOOKS 
Engineering 


DE RE METALLICA, Georgius Agricola. The famous Hoover translation of great- 
est treatise on technological chemistry, engineering, geology, mining of early mod- 
ern times (1556). All 289 original woodcuts. 638pp. 6% x 11. 0-486-60006-8 


FUNDAMENTALS OF ASTRODYNAMICS, Roger Bate et al. Modern approach 
developed by U.S. Air Force Academy. Designed as a first course. Problems, exer- 
cises. Numerous illustrations. 455pp. 5% x 8%. 0-486-60061-0 


DYNAMICS OF FLUIDS IN POROUS MEDIA, Jacob Bear. For advanced stu- 

dents of ground water hydrology, soil mechanics and physics, drainage and irrigation 

engineering and more. 335 illustrations. Exercises, with answers. 784pp. 6% x 914. 
0-486-65675-6 


THEORY OF VISCOELASTICITY (Second Edition), Richard M. Christensen. 
Complete consistent description of the linear theory of the viscoelastic behavior of 
materials. Problem-solving techniques discussed. 1982 edition. 29 figures. 
xiv+364pp. 64% x 94. 0-486-42880-X 


MECHANICS, J. P. Den Hartog. A classic introductory text or refresher. Hundreds 
of applications and design problems illuminate fundamentals of trusses, loaded 
beams and cables, etc. 334 answered problems. 462pp. 5% x 8/4. 0-486-60754-2 


MECHANICAL VIBRATIONS, J. P. Den Hartog. Classic textbook offers lucid 
explanations and illustrative models, applying theories of vibrations to a variety of 
practical industrial engineering problems. Numerous figures. 233 problems, solu- 
tions. Appendix. Index. Preface. 436pp. 5% x 8%. 0-486-64785-4 


STRENGTH OF MATERIALS, J. P. Den Hartog. Full, clear treatment of basic 
material (tension, torsion, bending, etc.) plus advanced material on engineering 
methods, applications. 350 answered problems. 323pp. 5% x 81. 0-486-60755-0 


A HISTORY OF MECHANICS, René Dugas. Monumental study of mechanical 
principles from antiquity to quantum mechanics. Contributions of ancient Greeks, 
Galileo, Leonardo, Kepler, Lagrange, many others. 671pp. 5% x 84. 0-486-65632-2 


STABILITY THEORY AND ITS APPLICATIONS TO STRUCTURAL 
MECHANICS, Clive L. Dym. Self-contained text focuses on Koiter postbuckling 
analyses, with mathematical notions of stability of motion. Basing minimum energy 
principles for static stability upon dynamic concepts of stability of motion, it devel- 
ops asymptotic buckling and postbuckling analyses from potential energy considera- 
tions, with applications to columns, plates, and arches. 1974 ed. 208pp. 5% x 814. 
0-486-42541-X 


METAL FATIGUE, N. E. Frost, K. J. Marsh, and L. P. Pook. Definitive, clearly writ- 
ten, and well-illustrated volume addresses all aspects of the subject, from the histori- 
cal development of understanding metal fatigue to vital concepts of the cyclic stress 
that causes a crack to grow. Includes 7 appendixes. 544pp. 5% x 84. 0-486-40927-9 


CATALOG OF DOVER BOOKS 


ROCKETS, Robert Goddard. Two of the most significant publications in the history 
of rocketry and jet propulsion: “A Method of Reaching Extreme Altitudes” (1919) and 
“Liquid Propellant Rocket Development” (1936). 128pp. 5% x 8'4. 0-486-42537-1 


STATISTICAL MECHANICS: PRINCIPLES AND APPLICATIONS, Terrell L. 

Hill. Standard text covers fundamentals of statistical mechanics, applications to fluc- 

tuation theory, imperfect gases, distribution functions, more. 448pp. 5% x 844. 
0-486-65390-0 


ENGINEERING AND TECHNOLOGY 1650-1750: ILLUSTRATIONS AND 
TEXTS FROM ORIGINAL SOURCES, Martin Jensen. Highly readable text with 
more than 200 contemporary drawings and detailed engravings of engineering pro- 
jects dealing with surveying, leveling, materials, hand tools, lifting equipment, trans- 
port and erection, piling, bailing, water supply, hydraulic engineering, and more. 
Among the specific projects outlined-transporting a 50-ton stone to the Louvre, erect- 
ing an obelisk, building timber locks, and dredging canals. 207pp. 8% x 114. 
0-486-42232-1 


THE VARIATIONAL PRINCIPLES OF MECHANICS, Cornelius Lanczos. 
Graduate level coverage of calculus of variations, equations of motion, relativistic 
mechanics, more. First inexpensive paperbound edition of classic treatise. Index. 
Bibliography. 418pp. 5% x 84. 0-486-65067-7 


PROTECTION OF ELECTRONIC CIRCUITS FROM OVERVOLTAGES, 
Ronald B. Standler. Five-part treatment presents practical rules and strategies for cir- 
cuits designed to protect electronic systems from damage by transient overvoltages. 
1989 ed. xxivt+434pp. 61 x 9%. 0-486-42552-5 


ROTARY WING AERODYNAMICS, W. Z. Stepniewski. Clear, concise text cov- 

ers aerodynamic phenomena of the rotor and offers guidelines for helicopter perfor- 

mance evaluation. Originally prepared for NASA. 537 figures. 640pp. 6% x 914. 
0-486-64647-5 


INTRODUCTION TO SPACE DYNAMICS, William Tyrrell Thomson. Com- 
prehensive, classic introduction to space-flight engineering for advanced undergrad- 
uate and graduate students. Includes vector algebra, kinematics, transformation of 
coordinates. Bibliography. Index. 352pp. 5% x 84. 0-486-65113-4 


HISTORY OF STRENGTH OF MATERIALS, Stephen P. Timoshenko. Excellent 
historical survey of the strength of materials with many references to the theories of 
elasticity and structure. 245 figures. 452pp. 5% x 8/4. 0-486-61187-6 


ANALYTICAL FRACTURE MECHANICS, David J. Unger. Self-contained text 
supplements standard fracture mechanics texts by focusing on analytical methods for 
determining crack-tip stress and strain fields. 336pp. 6% x 914. 0-486-41737-9 


STATISTICAL MECHANICS OF ELASTICITY, J. H. Weiner. Advanced, self- 
contained treatment illustrates general principles and elastic behavior of solids. Part 
1, based on classical mechanics, studies thermoelastic behavior of crystalline and 
polymeric solids. Part 2, based on quantum mechanics, focuses on interatomic force 
laws, behavior of solids, and thermally activated processes. For students of physics 
and chemistry and for polymer physicists. 1983 ed. 96 figures. 496pp. 5% x 84. 
0-486-42260-7 


CATALOG OF DOVER BOOKS 


Mathematics 


FUNCTIONAL ANALYSIS (Second Corrected Edition), George Bachman and 
Lawrence Narici. Excellent treatment of subject geared toward students with back- 
ground in linear algebra, advanced calculus, physics and engineering. Text covers 
introduction to inner-product spaces, normed, metric spaces, and topological spaces; 
complete orthonormal sets, the Hahn-Banach Theorem and its consequences, and 
many other related subjects. 1966 ed. 544pp. 614 x 94. 0-486-40251-7 


ASYMPTOTIC EXPANSIONS OF INTEGRALS, Norman Bleistein & Richard A. 
Handelsman. Best introduction to important field with applications in a variety of sci- 
entific disciplines. New preface. Problems. Diagrams. Tables. Bibliography. Index. 
448pp. 5% x 84. 0-486-65082-0 


VECTOR AND TENSOR ANALYSIS WITH APPLICATIONS, A. I. Borisenko 
and I. E. Tarapov. Concise introduction. Worked-out problems, solutions, exercises. 
257pp. 5% x 84. 0-486-63833-2 


AN INTRODUCTION TO ORDINARY DIFFERENTIAL EQUATIONS, Earl 
A. Coddington. A thorough and systematic first course in elementary differential 
equations for undergraduates in mathematics and science, with many exercises and 
problems (with answers). Index. 304pp. 5% x 814. 0-486-65942-9 


FOURIER SERIES AND ORTHOGONAL FUNCTIONS, Harry F. Davis. An 
incisive text combining theory and practical example to introduce Fourier series, 
orthogonal functions and applications of the Fourier method to boundary-value 
problems. 570 exercises. Answers and notes. 416pp. 5% x 8’. 0-486-65973-9 


COMPUTABILITY AND UNSOLVABILITY, Martin Davis. Classic graduate- 
level introduction to theory of computability, usually referred to as theory of recur- 
rent functions. New preface and appendix. 288pp. 5% x 8/4. 0-486-61471-9 


ASYMPTOTIC METHODS IN ANALYSIS, N. G. de Bruijn. An inexpensive, com- 
prehensive guide to asymptotic methods—the pioneering work that teaches by 
explaining worked examples in detail. Index. 224pp. 5% x 8 0-486-64221-6 


APPLIED COMPLEX VARIABLES, John W. Dettman. Step-by-step coverage of 
fundamentals of analytic function theory—plus lucid exposition of five important 
applications: Potential Theory; Ordinary Differential Equations; Fourier Transforms; 
Laplace Transforms; Asymptotic Expansions. 66 figures. Exercises at chapter ends. 
512pp. 5% x 84. 0-486-64670-X 


INTRODUCTION TO LINEAR ALGEBRA AND DIFFERENTIAL EQUA- 
TIONS, John W. Dettman. Excellent text covers complex numbers, determinants, 
orthonormal bases, Laplace transforms, much more. Exercises with solutions. 
Undergraduate level. 416pp. 5% x 8%. 0-486-65191-6 


RIEMANN’S ZETA FUNCTION, H. M. Edwards. Superb, high-level study of 
landmark 1859 publication entitled “On the Number of Primes Less Than a Given 
Magnitude” traces developments in mathematical theory that it inspired. xiv+315pp. 
5% x 8h. 0-486-41740-9 


CATALOG OF DOVER BOOKS 


CALCULUS OF VARIATIONS WITH APPLICATIONS, George M. Ewing. 
Applications-oriented introduction to variational theory develops insight and pro- 
motes understanding of specialized books, research papers. Suitable for advanced 
undergraduate/graduate students as primary, supplementary text. 352pp. 5% x 34. 
0-486-64856-7 


COMPLEX VARIABLES, Francis J. Flanigan. Unusual approach, delaying complex 
algebra till harmonic functions have been analyzed from real variable viewpoint. 
Includes problems with answers. 364pp. 5% x 81. 0-486-61388-7 


AN INTRODUCTION TO THE CALCULUS OF VARIATIONS, Charles Fox. 

Graduate-level text covers variations of an integral, isoperimetrical problems, least 

action, special relativity, approximations, more. References. 279pp. 5% x 8's. 
0-486-65499-0 


COUNTEREXAMPLES IN ANALYSIS, Bernard R. Gelbaum and John M. H. 
Olmsted. These counterexamples deal mostly with the part of analysis known as 
“real variables.” The first half covers the real number system, and the second half 
encompasses higher dimensions. 1962 edition. xxiv+198pp. 5% x 8%. 0-486-42875-3 


CATASTROPHE THEORY FOR SCIENTISTS AND ENGINEERS, Robert 
Gilmore. Advanced-level treatment describes mathematics of theory grounded in the 
work of Poincaré, R. Thom, other mathematicians. Also important applications to 
problems in mathematics, physics, chemistry and engineering. 1981 edition. 
References. 28 tables. 397 black-and-white illustrations. xvii + 666pp. 6'4 x 9's. 
0-486-67539-4 


INTRODUCTION TO DIFFERENCE EQUATIONS, Samuel Goldberg. Excep- 
tionally clear exposition of important discipline with applications to sociology, psy- 
chology, economics. Many illustrative examples; over 250 problems. 260pp. 5% x 84. 

0-486-65084-7 


NUMERICAL METHODS FOR SCIENTISTS AND ENGINEERS, Richard 
Hamming. Classic text stresses frequency approach in coverage of algorithms, poly- 
nomial approximation, Fourier approximation, exponential approximation, other 
topics. Revised and enlarged 2nd edition. 721pp. 5% x 84. 0-486-65241-6 


INTRODUCTION TO NUMERICAL ANALYSIS (2nd Edition), F. B. Hilde- 
brand. Classic, fundamental treatment covers computation, approximation, inter- 
polation, numerical differentiation and integration, other topics. 150 new problems. 
669pp. 5% x 84. 0-486-65363-3 


THREE PEARLS OF NUMBER THEORY, A. Y. Khinchin. Three compelling 
puzzles require proof of a basic law governing the world of numbers. Challenges con- 
cern van der Waerden’s theorem, the Landau-Schnirelmann hypothesis and Mann’s 
theorem, and a solution to Waring’s problem. Solutions included. 64pp. 5% x 8/4. 
0-486-40026-3 


THE PHILOSOPHY OF MATHEMATICS: AN INTRODUCTORY ESSAY, 
Stephan Korner. Surveys the views of Plato, Aristotle, Leibniz & Kant concerning 
propositions and theories of applied and pure mathematics. Introduction. Two 
appendices. Index. 198pp. 5% x 84. 0-486-25048-2 


CATALOG OF DOVER BOOKS 


INTRODUCTORY REAL ANALYSIS, A.N. Kolmogorov, S. V. Fomin. Translated 
by Richard A. Silverman. Self-contained, evenly paced introduction to real and func- 
tional analysis. Some 350 problems. 403pp. 5% x 84. 0-486-61226-0 


APPLIED ANALYSIS, Cornelius Lanczos. Classic work on analysis and design of 
finite processes for approximating solution of analytical problems. Algebraic equa- 
tions, matrices, harmonic analysis, quadrature methods, much more. 559pp. 5% x 8%. 


0-486-65656-X 


AN INTRODUCTION TO ALGEBRAIC STRUCTURES, Joseph Landin. Superb 
self-contained text covers “abstract algebra”: sets and numbers, theory of groups, the- 
ory of rings, much more. Numerous well-chosen examples, exercises. 247pp. 5% x 84. 

0-486-65940-2 


QUALITATIVE THEORY OF DIFFERENTIAL EQUATIONS, V. V. Nemytskii 
and V.V. Stepanov. Classic graduate-level text by two prominent Soviet mathemati- 
cians covers classical differential equations as well as topological dynamics and 
ergodic theory. Bibliographies. 523pp. 5% x 8. 0-486-65954-2 


THEORY OF MATRICES, Sam Perlis. Outstanding text covering rank, nonsingu- 
larity and inverses in connection with the development of canonical matrices under 
the relation of equivalence, and without the intervention of determinants. Includes 
exercises. 237pp. 5% x 814. 0-486-66810-X 


INTRODUCTION TO ANALYSIS, Maxwell Rosenlicht. Unusually clear, accessi- 
ble coverage of set theory, real number system, metric spaces, continuous functions, 
Riemann integration, multiple integrals, more. Wide range of problems. Under- 
graduate level. Bibliography. 254pp. 5% x 8. 0-486-65038-3 


MODERN NONLINEAR EQUATIONS, Thomas L. Saaty. Emphasizes practical 
solution of problems; covers seven types of equations. “. . . a welcome contribution 
to the existing literature...."-Math Reviews. 490pp. 5% x 8°. 0-486-64232-1 


MATRICES AND LINEAR ALGEBRA, Hans Schneider and George Phillip 
Barker. Basic textbook covers theory of matrices and its applications to systems of lin- 
ear equations and related topics such as determinants, eigenvalues and differential 
equations. Numerous exercises. 432pp. 5% x 8'2. 0-486-66014-1 


LINEAR ALGEBRA, Georgi E. Shilov. Determinants, linear spaces, matrix alge- 
bras, similar topics. For advanced undergraduates, graduates. Silverman translation. 
387pp. 5% x 8h. 0-486-63518-X 


ELEMENTS OF REAL ANALYSIS, David A. Sprecher. Classic text covers funda- 
mental concepts, real number system, point sets, functions of a real variable, Fourier 
series, much more. Over 500 exercises. 352pp. 5% x 8’. 0-486-65385-4 


SET THEORY AND LOGIC, Robert R. Stoll. Lucid introduction to unified theory 
of mathematical concepts. Set theory and logic seen as tools for conceptual under- 
standing of real number system. 496pp. 5% x 814. 0-486-63829-4 


CATALOG OF DOVER BOOKS 


TENSOR CALCULUS, J.L. Synge and A. Schild. Widely used introductory text 
covers spaces and tensors, basic operations in Riemannian space, non-Riemannian 
spaces, etc. 324pp. 5% x 814. 0-486-63612-7 


ORDINARY DIFFERENTIAL EQUATIONS, Morris Tenenbaum and Harry 
Pollard. Exhaustive survey of ordinary differential equations for undergraduates in 
mathematics, engineering, science. Thorough analysis of theorems. Diagrams. 


Bibliography. Index. 818pp. 5% x 87. 0-486-64940-7 


INTEGRAL EQUATIONS, F. G. Tricomi. Authoritative, well-written treatment of 
extremely useful mathematical tool with wide applications. Volterra Equations, 
Fredholm Equations, much more. Advanced undergraduate to graduate level. 
Exercises. Bibliography. 238pp. 5% x 84. 0-486-64828-1 


FOURIER SERIES, Georgi P. Tolstov. Translated by Richard A. Silverman. A valu- 
able addition to the literature on the subject, moving clearly from subject to subject 
and theorem to theorem. 107 problems, answers. 336pp. 5% x 8%. 0-486-63317-9 


INTRODUCTION TO MATHEMATICAL THINKING, Friedrich Waismann. 
Examinations of arithmetic, geometry, and theory of integers; rational and natural 
numbers; complete induction; limit and point of accumulation; remarkable curves; 
complex and hypercomplex numbers, more. 1959 ed. 27 figures. xii+260pp. 5% x 81. 

0-486-63317-9 


POPULAR LECTURES ON MATHEMATICAL LOGIC, Hao Wang. Noted logi- 
cian’s lucid treatment of historical developments, set theory, model theory, recursion 
theory and constructivism, proof theory, more. 3 appendixes. Bibliography. 1981 edi- 
tion. ix + 283pp. 5% x 84. 0-486-67632-3 


CALCULUS OF VARIATIONS, Robert Weinstock. Basic introduction covering 
isoperimetric problems, theory of elasticity, quantum mechanics, electrostatics, etc. 
Exercises throughout. 326pp. 5% x 8%. 0-486-63069-2 


THE CONTINUUM: A CRITICAL EXAMINATION OF THE FOUNDATION 

OF ANALYSIS, Hermann Weyl. Classic of 20th-century foundational research deals 

with the conceptual problem posed by the continuum. 156pp. 5% x 8/4. 
0-486-67982-9 


CHALLENGING MATHEMATICAL PROBLEMS WITH ELEMENTARY 
SOLUTIONS, A. M. Yaglom and I. M. Yaglom. Over 170 challenging problems on 
probability theory, combinatorial analysis, points and lines, topology, convex poly- 
gons, many other topics. Solutions. Total of 445pp. 5% x 8'4. Two-vol. set. 

Vol. I: 0-486-65536-9 Vol. IT: 0-486-65537-7 


INTRODUCTION TO PARTIAL DIFFERENTIAL EQUATIONS WITH 
APPLICATIONS, E. C. Zachmanoglou and Dale W. Thoe. Essentials of partial dif- 
ferential equations applied to common problems in engineering and the physical sci- 
ences. Problems and answers. 4l6pp. 5% x 84. 0-486-65251-3 


THE THEORY OF GROUPS, Hans J. Zassenhaus. Well-written graduate-level text 
acquaints reader with group-theoretic methods and demonstrates their usefulness in 
mathematics. Axioms, the calculus of complexes, homomorphic mapping, group 
theory, more. 276pp. 5% x 84. 0-486-40922-8 


CATALOG OF DOVER BOOKS 


Math-—Decision Theory, Statistics, Probability 
ELEMENTARY DECISION THEORY, Herman Chernoff and Lincoln E. 


Moses. Clear introduction to statistics and statistical theory covers data process- 
ing, probability and random variables, testing hypotheses, much more. Exercises. 


364pp. 5% x 84. 0-486-65218-1 


STATISTICS MANUAL, Edwin L. Crow et al. Comprehensive, practical collection 
of classical and moder methods prepared by U.S. Naval Ordnance Test Station. 
Stress on use. Basics of statistics assumed. 288pp. 5% x 84. 0-486-60599-X 


SOME THEORY OF SAMPLING, William Edwards Deming. Analysis of the 
problems, theory and design of sampling techniques for social scientists, industrial 
managers and others who find statistics important at work. 61 tables. 90 figures. xvii 
+602pp. 5% x 84. 0-486-64684-X 


LINEAR PROGRAMMING AND ECONOMIC ANALYSIS, Robert Dorfman, 
Paul A. Samuelson and Robert M. Solow. First comprehensive treatment of linear 
programming in standard economic analysis. Game theory, modern welfare eco- 
nomics, Leontief input-output, more. 525pp. 5% x 844. 0-486-65491-5 


PROBABILITY: AN INTRODUCTION, Samuel Goldberg. Excellent basic text 
covers set theory, probability theory for finite sample spaces, binomial theorem, 
much more. 360 problems. Bibliographies. 322pp. 5% x 84. 0-486-65252-1 


GAMES AND DECISIONS: INTRODUCTION AND CRITICAL SURVEY, 
R. Duncan Luce and Howard Raiffa. Superb nontechnical introduction to game the- 
ory, primarily applied to social sciences. Utility theory, zero-sum games, n-person 
games, decision-making, much more. Bibliography. 509pp. 5% x 8%. 0-486-65943-7 


INTRODUCTION TO THE THEORY OF GAMES, J. C. C. McKinsey. This com- 
prehensive overview of the mathematical theory of games illustrates applications to 
situations involving conflicts of interest, including economic, social, political, and 
military contexts. Appropriate for advanced undergraduate and graduate courses; 
advanced calculus a prerequisite. 1952 ed. x+372pp. 5% x 8%. 0-486-42811-7 


FIFTY CHALLENGING PROBLEMS IN PROBABILITY WITH SOLUTIONS, 
Frederick Mosteller. Remarkable puzzlers, graded in difficulty, illustrate elementary 
and advanced aspects of probability. Detailed solutions. 88pp. 5% x 8’. 65355-2 


PROBABILITY THEORY: A CONCISE COURSE, Y. A. Rozanov. Highly read- 
able, self-contained introduction covers combination of events, dependent events, 
Bernoulli trials, etc. 148pp. 5% x 814. 0-486-63544-9 


STATISTICAL METHOD FROM THE VIEWPOINT OF QUALITY CON- 
TROL, Walter A. Shewhart. Important text explains regulation of variables, uses of 
statistical control to achieve quality control in industry, agriculture, other areas. 
192pp. 5% x 844. 0-486-65232-7 


CATALOG OF DOVER BOOKS 


Math—Geometry and Topology 


ELEMENTARY CONCEPTS OF TOPOLOGY, Paul Alexandroff. Elegant, intuitive 
approach to topology from set-theoretic topology to Betti groups; how concepts of 
topology are useful in math and physics. 25 figures. 57pp. 5% x 8%.  0-486-60747-X 


COMBINATORIAL TOPOLOGY, P. S. Alexandrov. Clearly written, well-orga- 
nized, three-part text begins by dealing with certain classic problems without using 
the formal techniques of homology theory and advances to the central concept, the 
Betti groups. Numerous detailed examples. 654pp. 5% x 8%. 0-486-40179-0 


EXPERIMENTS IN TOPOLOGY, Stephen Barr. Classic, lively explanation of one 
of the byways of mathematics. Klein bottles, Moebius strips, projective planes, map 
coloring, problem of the Koenigsberg bridges, much more, described with clarity and 
wit. 43 figures. 210pp. 5% x 81. 0-486-25933-1 


THE GEOMETRY OF RENE DESCARTES, René Descartes. The great work 
founded analytical geometry. Original French text, Descartes’s own diagrams, togeth- 
er with definitive Smith-Latham translation. 244pp. 5% x 84. 0-486-60068-8 


EUCLIDEAN GEOMETRY AND TRANSFORMATIONS, Clayton W. Dodge. 
This introduction to Euclidean geometry emphasizes transformations, particularly 
isometries and similarities. Suitable for undergraduate courses, it includes numerous 
examples, many with detailed answers. 1972 ed. viiit296pp. 6'4 x 94. 0-486-43476-1 


PRACTICAL CONIC SECTIONS: THE GEOMETRIC PROPERTIES OF 
ELLIPSES, PARABOLAS AND HYPERBOLAS, J. W. Downs. This text shows how 
to create ellipses, parabolas, and hyperbolas. It also presents historical background on 
their ancient origins and describes the reflective properties and roles of curves in 
design applications. 1993 ed. 98 figures. xii+100pp. 6% x 94. 0-486-42876-1 


THE THIRTEEN BOOKS OF EUCLID’S ELEMENTS, translated with introduc- 
tion and commentary by Sir Thomas L. Heath. Definitive edition. Textual and lin- 
guistic notes, mathematical analysis. 2,500 years of critical commentary. Unabridged. 
1,4l4pp. 5% x 8%. Three-vol. set. 

Vol. I: 0-486-60088-2 Vol. II: 0-486-60089-0 Vol. III: 0-486-60090-4 


SPACE AND GEOMETRY: IN THE LIGHT OF PHYSIOLOGICAL, 
PSYCHOLOGICAL AND PHYSICAL INQUIRY, Ernst Mach. Three essays by 
an eminent philosopher and scientist explore the nature, origin, and development of 
our concepts of space, with a distinctness and precision suitable for undergraduate 
students and other readers. 1906 ed. vit 148pp. 5% x 81. 0-486-43909-7 


GEOMETRY OF COMPLEX NUMBERS, Hans Schwerdtfeger. Illuminating, 
widely praised book on analytic geometry of circles, the Moebius transformation, 
and two-dimensional non-Euclidean geometries. 200pp. 5% x 8%. —- 0-486-63830-8 


DIFFERENTIAL GEOMETRY, Heinrich W. Guggenheimer. Local differential geom- 
etry as an application of advanced calculus and linear algebra. Curvature, transforma- 
tion groups, surfaces, more. Exercises. 62 figures. 378pp. 5% x 814. 0-486-63433-7 


CATALOG OF DOVER BOOKS 


History of Math 


THE WORKS OF ARCHIMEDES, Archimedes (T. L. Heath, ed.). Topics include 
the famous problems of the ratio of the areas of a cylinder and an inscribed sphere; 
the measurement of a circle; the properties of conoids, spheroids, and spirals; and the 
quadrature of the parabola. Informative introduction. clxxxvit326pp. 5% x 814. 
0-486-42084-1 


A SHORT ACCOUNT OF THE HISTORY OF MATHEMATICS, W. W. Rouse 
Ball. One of clearest, most authoritative surveys from the Egyptians and Phoenicians 
through 19th-century figures such as Grassman, Galois, Riemann. Fourth edition. 
522pp. 5% x 844. 0-486-20630-0 


THE HISTORY OF THE CALCULUS AND ITS CONCEPTUAL DEVELOP- 
MENT, Carl B. Boyer. Origins in antiquity, medieval contributions, work of Newton, 
Leibniz, rigorous formulation. Treatment is verbal. 346pp. 5% x 8%. 0-486-60509-4 


THE HISTORICAL ROOTS OF ELEMENTARY MATHEMATICS, Lucas N. H. 
Bunt, Phillip S. Jones, and Jack D. Bedient. Fundamental underpinnings of modern 
arithmetic, algebra, geometry and number systems derived from ancient civiliza- 


tions. 320pp. 5% x 84. 0-486-25563-8 


A HISTORY OF MATHEMATICAL NOTATIONS, Florian Cajori. This classic 
study notes the first appearance of a mathematical symbol and its origin, the com- 
petition it encountered, its spread among writers in different countries, its rise to pop- 
ularity, its eventual decline or ultimate survival. Original 1929 two-volume edition 
presented here in one volume. xxviiit820pp. 5% x 8. 0-486-67766-4 


GAMES, GODS & GAMBLING: A HISTORY OF PROBABILITY AND 
STATISTICAL IDEAS, F. N. David. Episodes from the lives of Galileo, Fermat, 
Pascal, and others illustrate this fascinating account of the roots of mathematics. 
Features thought-provoking references to classics, archaeology, biography, poetry. 
1962 edition. 304pp. 5% x 8'4. (Available in US. only.) 0-486-40023-9 


OF MEN AND NUMBERS: THE STORY OF THE GREAT 
MATHEMATICIANS, Jane Muir. Fascinating accounts of the lives and accom- 
plishments of history’s greatest mathematical minds—Pythagoras, Descartes, Euler, 
Pascal, Cantor, many more. Anecdotal, illuminating. 30 diagrams. Bibliography. 
256pp. 5% x 8h. 0-486-28973-7 


HISTORY OF MATHEMATICS, David E. Smith. Nontechnical survey from 
ancient Greece and Orient to late 19th century; evolution of arithmetic, geometry, 
trigonometry, calculating devices, algebra, the calculus. 362 illustrations. 1,355pp. 
5% x 8%. Two-vol. set. Vol. I: 0-486-20429-4 Vol. II: 0-486-20430-8 


A CONCISE HISTORY OF MATHEMATICS, Dirk J. Struik. The best brief his- 
tory of mathematics. Stresses origins and covers every major figure from ancient 
Near East to 19th century. 41 illustrations. 195pp. 5% x 814. 0-486-60255-9 


CATALOG OF DOVER BOOKS 


Physics 


OPTICAL RESONANCE AND TWO-LEVEL ATOMS, L. Allen and J. H. Eberly. 
Clear, comprehensive introduction to basic principles behind all quantum optical res- 
onance phenomena. 53 illustrations. Preface. Index. 256pp. 5% x 8%. 0-486-65533-4 


QUANTUM THEORY, David Bohm. This advanced undergraduate-level text pre- 
sents the quantum theory in terms of qualitative and imaginative concepts, followed 
by specific applications worked out in mathematical detail. Preface. Index. 655pp. 
5% x 8b. 0-486-65969-0 


ATOMIC PHYSICS (8th EDITION), Max Born. Nobel laureate’s lucid treatment of 
kinetic theory of gases, elementary particles, nuclear atom, wave-corpuscles, atomic 
structure and spectral lines, much more. Over 40 appendices, bibliography. 495pp. 
5% x Bib. 0-486-65984-4 


A SOPHISTICATE’S PRIMER OF RELATIVITY, P. W. Bridgman. Geared 
toward readers already acquainted with special relativity, this book transcends the 
view of theory as a working tool to answer natural questions: What is a frame of ref- 
erence? What is a “law of nature”? What is the role of the “observer”? Extensive 
treatment, written in terms accessible to those without a scientific background. 1983 
ed. xlviiit172pp. 5% x 8'4. 0-486-42549-5 


AN INTRODUCTION TO HAMILTONIAN OPTICS, H. A. Buchdahl. Detailed 
account of the Hamiltonian treatment of aberration theory in geometrical optics. 
Many classes of optical systems defined in terms of the symmetries they possess. 
Problems with detailed solutions. 1970 edition. xv + 360pp. 5% x 8%. 0-486-67597-1 


PRIMER OF QUANTUM MECHANICS, Marvin Chester. Introductory text 
examines the classical quantum bead on a track: its state and representations; opera- 
tor eigenvalues; harmonic oscillator and bound bead in a symmetric force field; and 
bead in a spherical shell. Other topics include spin, matrices, and the structure of 
quantum mechanics; the simplest atom; indistinguishable particles; and stationary- 
state perturbation theory. 1992 ed. xiv+314pp. 6% x 94. 0-486-42878-8 


LECTURES ON QUANTUM MECHANICS, Paul A. M. Dirac. Four concise, bril- 
liant lectures on mathematical methods in quantum mechanics from Nobel Prize- 
winning quantum pioneer build on idea of visualizing quantum theory through the 
use of classical mechanics. 96pp. 5% x 844. 0-486-41713-1 


THIRTY YEARS THAT SHOOK PHYSICS: THE STORY OF QUANTUM 
THEORY, George Gamow. Lucid, accessible introduction to influential theory of 
energy and matter. Careful explanations of Dirac’s anti-particles, Bohr’s model of the 
atom, much more. 12 plates. Numerous drawings. 240pp. 5% x 84. 0-486-24895-X 


ELECTRONIC STRUCTURE AND THE PROPERTIES OF SOLIDS: THE 
PHYSICS OF THE CHEMICAL BOND, Walter A. Harrison. Innovative text 
offers basic understanding of the electronic structure of covalent and ionic solids, 
simple metals, transition metals and their compounds. Problems. 1980 edition. 
582pp. 6% x 94. 0-486-66021-4 


CATALOG OF DOVER BOOKS 


HYDRODYNAMIC AND HYDROMAGNETIC STABILITY, S. Chandrasekhar. 
Lucid examination of the Rayleigh-Benard problem; clear coverage of the theory of 
instabilities causing convection. 704pp. 5% x 8's. 0-486-64071-X 


INVESTIGATIONS ON THE THEORY OF THE BROWNIAN MOVEMENT, 
Albert Einstein. Five papers (1905-8) investigating dynamics of Brownian motion 
and evolving elementary theory. Notes by R. Fiirth. 122pp. 5% x 84. 0-486-60304-0 


THE PHYSICS OF WAVES, William C. Elmore and Mark A. Heald. Unique 
overview of classical wave theory. Acoustics, optics, electromagnetic radiation, more. 
Ideal as classroom text or for self-study. Problems. 477pp. 5% x 8%. 0-486-64926-1 


GRAVITY, George Gamow. Distinguished physicist and teacher takes reader- 
friendly look at three scientists whose work unlocked many of the mysteries behind 
the laws of physics: Galileo, Newton, and Einstein. Most of the book focuses on 
Newton’s ideas, with a concluding chapter on post-Einsteinian speculations concern- 
ing the relationship between gravity and other physical phenomena. 160pp. 5% x 8’. 

0-486-42563-0 


PHYSICAL PRINCIPLES OF THE QUANTUM THEORY, Werner Heisenberg. 
Nobel Laureate discusses quantum theory, uncertainty, wave mechanics, work of 
Dirac, Schroedinger, Compton, Wilson, Einstein, etc. 184pp. 5% x 84. 0-486-60113-7 


ATOMIC SPECTRA AND ATOMIC STRUCTURE, Gerhard Herzberg. One of 
best introductions; especially for specialist in other fields. Treatment is physical 
rather than mathematical. 80 illustrations. 257pp. 5% x 8’. 0-486-60115-3 


AN INTRODUCTION TO STATISTICAL THERMODYNAMICS, Terrell L. 

Hill. Excellent basic text offers wide-ranging coverage of quantum statistical mechan- 

ics, systems of interacting molecules, quantum statistics, more. 523pp. 5% x 8/4. 
0-486-65242-4 


THEORETICAL PHYSICS, Georg Joos, with Ira M. Freeman. Classic overview 
covers essential math, mechanics, electromagnetic theory, thermodynamics, quan- 
tum mechanics, nuclear physics, other topics. First paperback edition. xxiii + 885pp. 
5% x Bib. 0-486-65227-0 


PROBLEMS AND SOLUTIONS IN QUANTUM CHEMISTRY AND 
PHYSICS, Charles S. Johnson, Jr. and Lee G. Pedersen. Unusually varied problems, 
detailed solutions in coverage of quantum mechanics, wave mechanics, angular 
momentum, molecular spectroscopy, more. 280 problems plus 139 supplementary 
exercises. 430pp. 614 x 94. 0-486-65236-X 


THEORETICAL SOLID STATE PHYSICS, Vol. 1: Perfect Lattices in Equilibrium, 
Vol. I: Non-Equilibrium and Disorder, William Jones and Norman H. March. 
Monumental reference work covers fundamental theory of equilibrium properties of 
perfect crystalline solids, non-equilibrium properties, defects and disordered systems. 
Appendices. Problems. Preface. Diagrams. Index. Bibliography. Total of 1,301pp. 5% 
x 84. Two volumes. Vol. I: 0-486-65015-4 Vol. II: 0-486-65016-2 


WHAT IS RELATIVITY? L. D. Landau and G. B. Rumer. Written by a Nobel Prize 
physicist and his distinguished colleague, this compelling book explains the special 
theory of relativity to readers with no scientific background, using such familiar 
objects as trains, rulers, and clocks. 1960 ed. vit+72pp. 5% x 814. 0-486-42806-0 


CATALOG OF DOVER BOOKS 


A TREATISE ON ELECTRICITY AND MAGNETISM, James Clerk Maxwell. 
Important foundation work of modern physics. Brings to final form Maxwell’s theo- 
ry of electromagnetism and rigorously derives his general equations of field theory. 
1,084pp. 5% x 8%. Two-vol. set. Vol. I: 0-486-60636-8 Vol. IT: 0-486-60637-6 


QUANTUM MECHANICS: PRINCIPLES AND FORMALISM, Roy McWeeny. 
Graduate student-oriented volume develops subject as fundamental discipline, open- 
ing with review of origins of Schrédinger’s equations and vector spaces. Focusing on 
main principles of quantum mechanics and their immediate consequences, it con- 
cludes with final generalizations covering alternative “languages” or representations. 
1972 ed. 15 figures. xit+155pp. 5% x 84. 0-486-42829-X 


INTRODUCTION TO QUANTUM MECHANICS With Applications to 
Chemistry, Linus Pauling & E. Bright Wilson, Jr. Classic undergraduate text by Nobel 
Prize winner applies quantum mechanics to chemical and physical problems. 
Numerous tables and figures enhance the text. Chapter bibliographies. Appendices. 
Index. 468pp. 5% x 8'4. 0-486-64871-0 


METHODS OF THERMODYNAMICS, Howard Reiss. Outstanding text focuses 
on physical technique of thermodynamics, typical problem areas of understanding, 
and significance and use of thermodynamic potential. 1965 edition. 238pp. 5% x 844. 

0-486-69445-3 


THE ELECTROMAGNETIC FIELD, Albert Shadowitz. Comprehensive under- 
graduate text covers basics of electric and magnetic fields, builds up to electromag- 
netic theory. Also related topics, including relativity. Over 900 problems. 768pp. 
5% x 8h. 0-486-65660-8 


GREAT EXPERIMENTS IN PHYSICS: FIRSTHAND ACCOUNTS FROM 
GALILEO TO EINSTEIN, Morris H. Shamos (ed.). 25 crucial discoveries: Newton’s 
laws of motion, Chadwick’s study of the neutron, Hertz on electromagnetic waves, 
more. Original accounts clearly annotated. 370pp. 5% x 814. 0-486-25346-5 


EINSTEIN’S LEGACY, Julian Schwinger. A Nobel Laureate relates fascinating 
story of Einstein and development of relativity theory in well-illustrated, nontechni- 
cal volume. Subjects include meaning of time, paradoxes of space travel, gravity and 
its effect on light, non-Euclidean geometry and curving of space-time, impact of radio 
astronomy and space-age discoveries, and more. 189 b/w illustrations. xiv+250pp. 
8% x 94. 0-486-41974-6 


STATISTICAL PHYSICS, Gregory H. Wannier. Classic text combines thermody- 
namics, statistical mechanics and kinetic theory in one unified presentation of thermal 
physics. Problems with solutions. Bibliography. 532pp. 5% x 84. 0-486-65401-X 


Paperbound unless otherwise indicated. Available at your book dealer, online at 
www.doverpublications.com, or by writing to Dept. GI, Dover Publications, Inc., 31 East 
2nd Street, Mineola, NY 11501. For current price information or for free catalogues (please indi- 
cate field of interest), write to Dover Publications or log on to www.doverpublications.com 
and see every Dover book in print. Dover publishes more than 500 books each year on science, 
elementary and advanced mathematics, biology, music, art, literary history, social sciences, and 
other areas. 


CATALOG OF DOVER BOOKS 


TENSOR CALCULUS, J.L. Synge and A. Schild. Widely used introductory text 
covers spaces and tensors, basic operations in Riemannian space, non-Riemannian 
spaces, etc. 324pp. 5% x 84. 0-486-63612-7 


ORDINARY DIFFERENTIAL EQUATIONS, Morris Tenenbaum and Harry 
Pollard. Exhaustive survey of ordinary differential equations for undergraduates in 
mathematics, engineering, science. Thorough analysis of theorems. Diagrams. 
Bibliography. Index. 818pp. 5% x 8%. 0-486-64940-7 


INTEGRAL EQUATIONS, F. G. Tricomi. Authoritative, well-written treatment of 
extremely useful mathematical tool with wide applications. Volterra Equations, 
Fredholm Equations, much more. Advanced undergraduate to graduate level. 
Exercises. Bibliography. 238pp. 5% x 8%. 0-486-64828-1 


FOURIER SERIES, Georgi P. Tolstov. Translated by Richard A. Silverman. A valu- 
able addition to the literature on the subject, moving clearly from subject to subject 
and theorem to theorem. 107 problems, answers. 336pp. 5% x 8/4. 0-486-63317-9 


INTRODUCTION TO MATHEMATICAL THINKING, Friedrich Waismann. 
Examinations of arithmetic, geometry, and theory of integers; rational and natural 
numbers; complete induction; limit and point of accumulation; remarkable curves; 
complex and hypercomplex numbers, more. 1959 ed. 27 figures. xii+260pp. 5% x 8'4. 

0-486-63317-9 


POPULAR LECTURES ON MATHEMATICAL LOGIC, Hao Wang. Noted logi- 
cian’s lucid treatment of historical developments, set theory, model theory, recursion 
theory and constructivism, proof theory, more. 3 appendixes. Bibliography. 1981 edi- 
tion. ix + 283pp. 5% x 84. 0-486-67632-3 


CALCULUS OF VARIATIONS, Robert Weinstock. Basic introduction covering 
isoperimetric problems, theory of elasticity, quantum mechanics, electrostatics, etc. 
Exercises throughout. 326pp. 5% x 84. 0-486-63069-2 


THE CONTINUUM: A CRITICAL EXAMINATION OF THE FOUNDATION 

OF ANALYSIS, Hermann Weyl. Classic of 20th-century foundational research deals 

with the conceptual problem posed by the continuum. 156pp. 5% x 8%. 
0-486-67982-9 


CHALLENGING MATHEMATICAL PROBLEMS WITH ELEMENTARY 
SOLUTIONS, A. M. Yaglom and I. M. Yaglom. Over 170 challenging problems on 
probability theory, combinatorial analysis, points and lines, topology, convex poly- 
gons, many other topics. Solutions. Total of 445pp. 5% x 8%. Two-vol. set. 

Vol. I: 0-486-65536-9 Vol. IT: 0-486-65537-7 


Paperbound unless otherwise indicated. Available at your book dealer, online at 
www.doverpublications.com, or by writing to Dept. GI, Dover Publications, Inc., 31 East 
2nd Street, Mineola, NY 11501. For current price information or for free catalogues (please indi- 
cate field of interest), write to Dover Publications or log on to www.doverpublications.com 
and see every Dover book in print. Dover publishes more than 500 books each year on science, 
elementary and advanced mathematics, biology, music, art, literary history, social sciences, and 
other areas. 


(continued from front flap) 


SPECIAL FUNCTIONS AND THEIR APPLICATIONS, N. N. Lebedev. (60624-4) 
CHANCE, LUCK AND Statistics, Horace C. Levinson. (41997-5) 


TENSORS, DIFFERENTIAL FORMS, AND VARIATIONAL PRINCIPLES, David Lovelock 
and Hanno Rund. (65840-6) 


SURVEY OF MATRIX THEORY AND MATRIX INEQUALITIES, Marvin Marcus and 
Henryk Minc. (67102-X) 


ABSTRACT ALGEBRA AND SOLUTION BY RADICALS, John E. and Margaret W. 
Maxfield. (67121-6) 


FUNDAMENTAL CONCEPTS OF ALGEBRA, Bruce E. Meserve. (61470-0) 
FUNDAMENTAL CONCEPTS OF GEOMETRY, Bruce E. Meserve. (63415-9) 


FIFTY CHALLENGING PROBLEMS IN PROBABILITY WITH SOLUTIONS, Frederick 
Mosteller. (65355-2) 


NUMBER THEORY AND ITS HisTory, Oystein Ore. (65620-9) 
MATRICES AND TRANSFORMATIONS, Anthony J. Pettofrezzo. (636348) 
PROBABILITY THEORY: A CONCISE Course, Y. A. Rozanov. (63544-9) 


ORDINARY DIFFERENTIAL EQUATIONS AND STABILITY THEORY: AN INTRODUCTION, 
David A. Sanchez. (63828-6) 


LINEAR ALGEBRA, Georgi E. Shilov. (63518-X) 

ESSENTIAL CALCULUS WITH APPLICATIONS, Richard A. Silverman. (66097-4) 

A CONCISE HISTORY OF MATHEMATICS, Dirk J. Struik. (60255-9) 

PROBLEMS IN PROBABILITY THEORY, MATHEMATICAL STATISTICS AND THEORY OF 
RANDOM FUNCTIONS, A. A. Sveshnikov. (63717-4) 

TENSOR CALCULUS, J. L. Synge and A. Schild. (63612-7) 

CALCULUS OF VARIATIONS WITH APPLICATIONS TO PHYSICS AND ENGINEERING, Robert 
Weinstock. (63069-2) 

INTRODUCTION TO VECTOR AND TENSOR ANALYSIS, Robert C. Wrede. (61879-X) 

DISTRIBUTION THEORY AND TRANSFORM ANALYSIS, A. H. Zemanian. (65479-6) 


Paperbound unless otherwise indicated. Available at your book dealer, 
online at www.doverpublications.com, or by writing to Dept. 23, Dover 
Publications, Inc., 31 East 2nd Street, Mineola, NY 11501. For current 
price information or for free catalogues (please indicate field of interest), 
write to Dover Publications or log on to www.doverpublications.com 
and see every Dover book in print. Each year Dover publishes over 500 
books on fine art, music, crafts and needlework, antiques, languages, lit- 
erature, children’s books, chess, cookery, nature, anthropology, science, 
mathematics, and other areas. 

Manufactured in the U.S.A. 


WILLIAM J. LEVEQUE 


TOPICS IN. 
NUMBER, 
THEORY 


VOLUMES I AND Il 


This classic two-part work—now available in one book—assumes no 
prior theoretical knowledge on the reader’s part. Developing the sub- 
ject of number theory from the very beginning, the text provides an 
introduction to some of the important techniques and results of clas- 
sical and modern number theory. 


Volume I is a self-contained treatment, suitable as a first course for 
advanced undergraduates and beginning graduate students. 
Although calculus is used in two of the nine chapters, no other tech- 
nical knowledge is required. 


The advanced topics of Volume II require a much higher level of 
mathematical maturity than the previous section, including a work- 
ing knowledge of the theory of analytic functions. Its contents range 
from chapters on binary quadratic forms, algebraic numbers, and 
applications to rational number theory to the Thue-Siegel-Roth 
Theorem, irrationality and transcendence, Dirichlet’s Theorem, and 
the Prime Number Theorem. The author reinforces his teachings 
throughout both sections by posing numerous problems and offering 
hints for their solution. 


Unabridged Dover (2002) republication of the edition originally pub- 
lished by the Addison-Wesley Publishing Company, Inc., Reading, 
Massachusetts, 1956. Preface. Supplementary Reading. List of 
Symbols. Index. 496pp. 54s x 842. Paperbound. 


ALSO AVAILABLE 


ELEMENTARY THEORY OF NUMBERS, William Judson LeVeque. 144pp. 5% x 8. 
66348-5 

FUNDAMENTALS OF NUMBER THEORY, William Judson LeVeque. 288pp. 5% x 84. 
68906-9 

NUMBER THEORY, George E. Andrews. 259pp. 5% x 842. 68252-8 


For current price information write to Dover Publications, or log on to 
www.doverpublications.com—and see every Dover book in print. 


ISBN 0-466-4e5349-8 


52495 
< 
$24.95 IN USA 
37-50 IN CANADA 9!I7804861425399 


SASDNAW ‘'V AsSSar ASG NOISAG YSAOD 


